10

Testing and Quality Automation

Software is complex. No matter what language you use, what frameworks you build on, and how elegant your coding style is, it is hard to verify software correctness just by reading the code. That's not only because non-trivial applications usually consist of large amounts of code. It is also because complete software is often composed of many layers and relies on many external or interchangeable components, such as operating systems, libraries, databases, caches, web APIs, or clients used to interact with your code (browsers, for instance).

The complexity of modern software means that the verification of its correctness often requires you to go beyond your code. You need to consider the environment in which your code runs, variations of components that can be replaced, and the ways your code can be interacted with. That's why developers of high-quality software often employ special testing techniques that allow them to quickly and reliably verify that the code they write meets desired acceptance criteria.

Another concern of complex software is its maintainability. This can be understood as how easy it is to sustain the ongoing development of a piece of software. And development is not only about implementing new features or enhancements but also diagnosing and fixing issues that will be inevitably discovered along the way. Maintainable software is software that requires little effort to introduce new changes and where there is a low risk of introducing new defects upon change.

As you can probably guess, maintainability is a product of many software aspects. Automated testing of course helps in reducing the risk of change by enforcing that known use cases are properly covered by existing and future code. But it is not enough to ensure that future changes will be easy to implement. That's why modern testing methodologies also rely on automated code quality measurement and testing to enforce specific coding conventions, highlight potentially erroneous code fragments, or scan for security vulnerabilities.

The modern testing landscape is vast. It is easy to get lost in a sea of testing methodologies, tools, frameworks, libraries, and utilities. That's why in this chapter we will review the most popular testing and quality automation techniques that are often employed by professional Python developers. This should give you a good overview of what's generally possible and also allow you to build your own testing routine. We will cover the following topics:

  • The principles of test-driven development
  • Writing tests with pytest
  • Quality automation
  • Mutation testing
  • Useful testing utilities

We will use a lot of packages from PyPI, so let's start by considering the technical requirements for this chapter.

Technical requirements

The following are the Python packages that are mentioned in this chapter, which you can download from PyPI:

  • pytest
  • redis
  • coverage
  • mypy
  • mutmut
  • faker
  • freezegun

Information on how to install packages is included in Chapter 2, Modern Python Development Environments.

The code files for this chapter can be found at https://github.com/PacktPublishing/Expert-Python-Programming-Fourth-Edition/tree/main/Chapter%2010.

The principles of test-driven development

Testing is one of the most important elements of the software development process. It is so important that there is even a software development methodology called Test-Driven Development (TDD). It advocates writing software requirements as tests as the first (and foremost) step in developing code.

The principle is simple: you focus on the tests first. Use them to describe the behavior of the software, verify it, and check for potential errors. Only when those tests are complete should you proceed with the actual implementation to satisfy the tests.

TDD, in its simplest form, is an iterative process that consists of the following steps:

  1. Write tests: Tests should reflect the specification of a functionality or improvement that has not been implemented yet.
  2. Run tests: At this stage all new tests should fail as the feature or improvement is not yet implemented.
  3. Write a minimal valid implementation: The code should be dead simple but correct. It is OK if it does not look elegant or has performance issues. The main focus at this stage should be satisfying all the tests written in step 1. It is also easier to diagnose problems in code that is simple than in code that is optimized for performance.
  4. Run tests: At this stage all tests should pass. That includes both new and preexisting tests. If any of them fail, the code should be revised until it satisfies the requirements.
  5. Hone and polish: When all tests are satisfied, the code can be progressively refactored until it meets desired quality standards. This is the time for streamlining, refactoring, and sometimes obvious optimizations (if you have brute-forced your way through the problem). After each change, all tests should be rerun to ensure that no functionality was broken.

This simple process allows you to iteratively extend your application without worrying that new change will break some preexisting and tested functionality. It also helps to avoid premature optimization and guides you through the development with a series of simple and bite-sized steps.

TDD won't deliver the promised results without proper work hygiene. That's why it is important to follow some basic principles:

  • Keep the size of the tested unit small: In TDD we often talk about units of code and unit tests. A unit of code is a simple autonomous piece of software that (preferably) should do only one thing, and a single unit test should exercise one function or method with a single set of arguments. This makes writing tests easier but also favors good development practices and patterns like the single responsibility principle and inversion of control (see Chapter 5, Interfaces, Patterns, and Modularity).
  • Keep tests small and focused: It is almost always better to create many small and simple tests than one long and elaborate test. Every test should verify only one aspect/requirement of the intended functionality. Having granular tests allows for easier diagnosing of potential issues and better maintainability of the test suite. Small tests pinpoint problems better and are just easier to read.
  • Keep tests isolated and independent: The success of one test should not rely on the specific order of execution of all tests within the test suite. If a test relies on a specific state of the execution environment, the test itself should ensure that all preconditions are satisfied. Similarly, any side effects of the test should be cleaned up after the execution. Those preparation and cleanup phases of every test are also known as setup and teardown.

These few principles will help you write tests that are easy to understand and maintain. And that's important because writing tests is just yet another activity that takes time and increases initial development cost. Still, when done right, this is an investment that pays off pretty quickly. A systematic and automated testing routine reduces the number of software defects that would otherwise reach the end users. It also provides a framework for the verification of known software bugs.

A rigorous testing routine and following a few basic principles is usually supported by a dedicated testing library or framework. Python programmers are really lucky because the Python standard library comes with two built-in modules created exactly for the purpose of automated tests. These are:

  • doctest: A testing module for testing interactive code examples found in docstrings. It is a convenient way of merging documentation with tests. doctest is theoretically capable of handling unit tests but it is more often used to ensure that snippets of the code found in docstrings reflect the correct usage examples.

    You can read more about doctest in the official documentation found at https://docs.python.org/3/library/doctest.html.

  • unittest: A full-fledged testing framework inspired by JUnit (a popular Java testing framework). It allows for the organization of tests into test cases and test suites and provides common ways for managing setup and teardown primitives. unittest comes with a built-in test runner that is able to discover test modules across the whole codebase and execute specific test selections.

    You can read more about unittest in the official documentation found at https://docs.python.org/3/library/unittest.html.

These two modules together can satisfy most of the testing needs of even the most demanding developers. Unfortunately, doctest concentrates on a very specific use case for tests (the testing of code examples) and unittest requires a rather large amount of boilerplate due to class-oriented test organization. Its runner also isn't as flexible as it could be. That's why many professional programmers prefer using one of the third-party frameworks available on PyPI.

One such framework is pytest. It is probably one of the best and most mature Python testing frameworks out there. It offers a more convenient way of organizing tests as flat modules with test functions (instead of classes) but is also compatible with the unittest class-based test hierarchy. It has also a truly superior test runner and comes with a multitude of optional extensions.

The above advantages of pytest are the reason why we are not going to discuss the details of unittest and doctest usage. They are still great and useful but pytest is almost always a better and more practical choice. That's why we are now going to discuss examples of writing tests using pytest as our framework of choice.

Writing tests with pytest

Now it's time to put the theory into practice. We already know the advantages of TDD, so we'll try to build something simple with the help of tests. We will discuss the anatomy of a typical test and then go over common testing techniques and tools that are often employed by professional Python programmers. All of that will be done with the help of the pytest testing framework.

In order to do that we will require some problems to solve. After all, testing starts at the very beginning of the software development life cycle—when the software requirements are defined. In many testing methodologies, tests are just a code-native way of describing software requirements in executable form.

It's hard to find a single convincing programming challenge that would allow for showcasing a variety of testing techniques and at the same time would fit into a book format. That's why we are going to discuss a few small and unrelated problems instead. We will also revisit some of the examples found in previous chapters of the book.

From a TDD perspective, writing tests for existing code is of course an unorthodox approach to testing as writing tests ideally should precede the implementation and not vice versa. But it is a known practice. For professional programmers it is not uncommon to inherit a piece of software that is poorly tested or has not been tested at all. In such a situation, if you want to test your software reliably, you will have to eventually do the missing work. In our case, writing some tests for preexisting code will also be an interesting opportunity to talk about the challenges of writing tests after the code.

The first example will be pretty simple. It will allow us to understand the basic anatomy of a test and how to use the pytest runner to discover and run tests. Our task will be to create a function that:

  • Accepts an iterable of elements and a batch size
  • Returns an iterable of sub-lists where every sub-list is a batch of consecutive elements from the source list. The order of elements should stay the same
  • Each batch has the same size
  • If the source list does not have enough elements to fill the last batch, that batch should be shorter but never empty

That would be a relatively small but useful function. It could for instance be used to process large streams of data without needing to load them fully into process memory. It could also be used for distributing chunks of work to process separate threads or process workers as we learned in Chapter 6, Concurrency.

Let's start by writing the stub of our function to find out what we are working with. It will be named batches() and will be hosted in a file called batch.py. The signature can be as follows:

from typing import Any, Iterable, List
def batches(
    iterable: Iterable[Any], batch_size: int
) -> Iterable[List[Any]]:
    pass

We haven't provided any implementation yet as this is something we will take care of once the tests are done. We can see typing annotations that constitute part of the contract between the function and the caller.

Once we have done this, we are able to import our function into the test module to write the tests. The common convention for naming test modules is test_<module-name>.py, where <module-name> is the name of the module whose contents we are going to test. Let's create a file named test_batch.py.

The first test will do a pretty common thing: provide input data to the function and compare the results. We will be using a plain literal list as input. The following is some example test code:

from batch import batches
def test_batch_on_lists():
    assert list(batches([1, 2, 3, 4, 5, 6], 1)) == [
        [1], [2], [3], [4], [5], [6]
    ]
    assert list(batches([1, 2, 3, 4, 5, 6], 2)) == [
        [1, 2], [3, 4], [5, 6]
    ]
    assert list(batches([1, 2, 3, 4, 5, 6], 3)) == [
        [1, 2, 3], [4, 5, 6]
    ]
    assert list(batches([1, 2, 3, 4, 5, 6], 4)) == [
        [1, 2, 3, 4], [5, 6],
    ]

The assert statement is the preferred way in pytest to test for the pre- and post-conditions of tested code units. pytest is able to inspect such assertions, recognize their exceptions, and thanks to this, output detailed reports of the test failures in a readable form.

The above is a popular structure for tests for small utilities and is often just enough to ensure they work as intended. Still, it does not clearly reflect our requirements, so maybe it would be worth restructuring it a little bit.

The following is an example of two extra tests that more explicitly map to our predefined requirements:

from itertools import chain
def test_batch_order():
    iterable = range(100)
    batch_size = 2
    output = batches(iterable, batch_size)
    assert list(chain.from_iterable(output)) == list(iterable)
def test_batch_sizes():
    iterable = range(100)
    batch_size = 2
    output = list(batches(iterable, batch_size))
    for batch in output[:-1]:
        assert len(batch) == batch_size
    assert len(output[-1]) <= batch_size

The test_batch_order() test ensures that the order of elements in batches is the same as in the source iterable. The test_batch_sizes() test ensures that all batches have the same size (with the exception of the last batch, which can be shorter).

We can also see a pattern unfolding in both tests. In fact, many tests follow a very common structure:

  1. Setup: This is the step where the test data and all other prerequisites are prepared. In our case the setup consists of preparing iterable and batch_size arguments.
  2. Execution: This is when the actual tested unit of code is put into use and the results are saved for later inspection. In our case it is a call to the batches() function.
  3. Validation: In this step we verify that the specific requirement is met by inspecting the results of unit execution. In our case these are all the assert statements used to verify the saved output.
  4. Cleanup: This is the step where all resources that could affect other tests are released or returned back to the state they were in before the setup step. We didn't acquire any such resources, so in our case this step can be skipped.

According to the testing process outlined in the principles of test-driven development section, at the moment our test should fail as we haven't provided any function implementation yet. Let's run the pytest runner and see how it goes:

$ pytest -v

The output we get may look as follows:

======================== test session starts ========================
platform darwin -- Python 3.9.2, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- .../Expert-Python-Programming-Fourth-Edition/.venv/bin/python
cachedir: .pytest_cache
rootdir: .../Expert-Python-Programming-Fourth-Edition/Chapter 10/01 - Writing tests with pytest
collected 3 items
test_batch.py::test_batch_on_lists FAILED                     [ 33%]
test_batch.py::test_batch_order FAILED                        [ 66%]
test_batch.py::test_batch_sizes FAILED                        [100%]
============================= FAILURES ==============================
________________________ test_batch_on_lists ________________________
    def test_batch_on_lists():
>       assert list(batches([1, 2, 3, 4, 5, 6], 1)) == [
            [1], [2], [3], [4], [5], [6]
        ]
E       TypeError: 'NoneType' object is not iterable
test_batch.py:7: TypeError
_________________________ test_batch_order __________________________
    def test_batch_order():
        iterable = range(100)
        batch_size = 2
        output = batches(iterable, batch_size)
>       assert list(chain.from_iterable(output)) == list(iterable)
E       TypeError: 'NoneType' object is not iterable
test_batch.py:27: TypeError
_________________________ test_batch_sizes __________________________
    def test_batch_sizes():
        iterable = range(100)
        batch_size = 2
>       output = list(batches(iterable, batch_size))
E       TypeError: 'NoneType' object is not iterable
test_batch.py:34: TypeError
====================== short test summary info ======================
FAILED test_batch.py::test_batch_on_lists - TypeError: 'NoneType' ...
FAILED test_batch.py::test_batch_order - TypeError: 'NoneType' obj...
FAILED test_batch.py::test_batch_sizes - TypeError: 'NoneType' obj...

As we can see, the test run failed with three individual test failures and we've got a detailed report of what went wrong. We have the same failure in each test. TypeError says that the NoneType object is not iterable so it could not be converted to a list. This means that none of our three requirements have been met yet. That's understandable because the batches() function doesn't do anything meaningful yet.

Now it's time to satisfy those tests. The goal is to provide a minimal working implementation. That's why we won't do anything fancy and will provide a simple and naïve implementation based on lists. Let's take a look at our first iteration:

from typing import Any, Iterable, List
def batches(
    iterable: Iterable[Any], batch_size: int
) -> Iterable[List[Any]]:
    results = []
    batch = []
    for item in iterable:
        batch.append(item)
        if len(batch) == batch_size:
            results.append(batch)
            batch = []
    if batch:
        results.append(batch)
    return results

The idea is simple. We create one list of results and traverse the input iterable, actively creating new batches as we go. When a batch is full, we add it to the list of results and start a new one. When we are done, we check if there is an outstanding batch and add it to the results as well. Then we return the results.

This is a pretty naïve implementation that may not work well with arbitrarily large results, but it should satisfy our tests. Let's run the pytest command to see if it works:

$ pytest -v

The test result should now be as follows:

======================== test session starts ========================
platform darwin -- Python 3.9.2, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- .../Expert-Python-Programming-Fourth-Edition/.venv/bin/python
cachedir: .pytest_cache
rootdir: .../Expert-Python-Programming-Fourth-Edition/Chapter 10/01 - Writing tests with pytest
collected 3 items
test_batch.py::test_batch_on_lists PASSED                     [ 33%]
test_batch.py::test_batch_order PASSED                        [ 66%]
test_batch.py::test_batch_sizes PASSED                        [100%]
========================= 3 passed in 0.01s =========================

As we can see, all tests have passed successfully. This means that the batches() function satisfies the requirements specified by our tests. It doesn't mean the code is completely bug-free, but it gives us confidence that it works well within the area of conditions verified by the tests. The more tests we have and the more precise they are, the more confidence in the code correctness we have.

Our work is not over yet. We made a simple implementation and verified it works. Now we are ready to proceed to the step where we refine our code. One of the reasons for doing things this way is that it is easier to spot errors in tests when working with the simplest possible implementation of the code. Remember that tests are code too, so it is possible to make mistakes in tests as well.

If the implementation of a tested unit is simple and easy to understand, it will be easier to verify whether the tested code is wrong or there's a problem with the test itself.

The obvious problem with the first iteration of our batches() function is that it needs to store all the intermediate results in the results list variable. If the iterable argument is large enough (or even infinite), it will put a lot of stress on your application as it will have to load all the data into memory. A better way would be to convert that function into a generator that yields successive results. This can be done with only a little bit of tuning:

def batches(
    iterable: Iterable[Any], batch_size: int
) -> Iterable[List[Any]]:
    batch = []
    for item in iterable:
        batch.append(item)
        if len(batch) == batch_size:
            yield batch
            batch = []
    if batch:
        yield batch

The other way around would be to make use of the iterators and the itertools module as in the following example:

from itertools import islice
def batches(
    iterable: Iterable[Any], batch_size: int
) -> Iterable[List[Any]]:
    iterator = iter(iterable)
    while True:
        batch = list(islice(iterator, batch_size))
        if not batch:
            return
        yield batch

That's what is really great about the TDD approach. We are now able to easily experiment and tune existing function implementation with a reduced risk of breaking things. You can test it for yourself by replacing the batches() function implementation with one of those shown above and running the tests to see if it meets the defined requirements.

Our example problem was small and simple to understand and so our tests were easy to write. But not every unit of code will be like that. When testing larger or more complex parts of code, you will often need additional tools and techniques that allow you to write clean and readable tests. In the next few sections we will review common testing techniques often used by Python programmers and show how to implement them with the help of pytest. The first one will be test parameterization.

Test parameterization

Using a direct comparison of the received and expected function output is a common method for writing short unit tests. It allows you to write clear and condensed tests. That's why we used this method in our first test_batch_on_lists() test in the previous section.

One problem with that technique is that it breaks the classic pattern of setup, execution, verification, and cleanup stages. You can't see clearly what instructions prepare the test context, which function call constitutes unit execution, and which instructions perform result verification.

The other problem is that when the number of input-output data samples increases, tests become overly large. It is harder to read them, and potential independent failures are not properly isolated. Let's recall the code of the test_batch_on_lists() test to better understand this issue:

def test_batch_on_lists():
    assert list(batches([1, 2, 3, 4, 5, 6], 1)) == [
        [1], [2], [3], [4], [5], [6]
    ]
    assert list(batches([1, 2, 3, 4, 5, 6], 2)) == [
        [1, 2], [3, 4], [5, 6]
    ]
    assert list(batches([1, 2, 3, 4, 5, 6], 3)) == [
        [1, 2, 3], [4, 5, 6]
    ]
    assert list(batches([1, 2, 3, 4, 5, 6], 4)) == [
        [1, 2, 3, 4], [5, 6],
    ]

Each assert statement is responsible for verifying one pair of input-output samples. But each pair can be constructed to verify different conditions of the initial requirements. In our case the first three statements could verify that the size of each output batch is the same. But the last assert verifies that the incomplete batch is also returned if the length of the iterable argument is not divisible by batch_size. The intention of the test isn't perfectly clear as it slightly breaks the "keep tests small and focused" principle.

We can slightly improve the test structure by moving the preparation of all the samples to the separate setup part of the test and then iterating over the samples in the main execution part. In our case this can be done with a simple dictionary literal:

def test_batch_with_loop():
    iterable = [1, 2, 3, 4, 5, 6]
    samples = {
        # even batches
        1: [[1], [2], [3], [4], [5], [6]],
        2: [[1, 2], [3, 4], [5, 6]],
        3: [[1, 2, 3], [4, 5, 6]],
        # batches with rest
        4: [[1, 2, 3, 4], [5, 6]],
    }
    for batch_size, expected in samples.items():
        assert list(batches(iterable, batch_size)) == expected

See how a small change to the test structure allowed us to annotate which samples are expected to verify a particular function requirement. We don't always have to use separate tests for every requirement. Remember: practicality beats purity.

Thanks to this change we have a more clear separation of the setup and execution parts of the test. We can now say that the execution of the batch() function is parameterized with the content of the samples dictionary. It is like running multiple small tests within a single test run.

Another problem with testing multiple samples within a single test function is that the test may break early. If the first assert statement fails, the test will stop immediately. We won't know whether subsequent assert statements would succeed or fail until we fix the first error and are able to proceed with further execution of the test. And having a full overview of all individual failures often allows us to better understand what's wrong with the tested code.

This problem cannot be easily solved in a loop-based test. Fortunately, pytest comes with native support for test parameterization in the form of the @pytest.mark.parametrize decorator. This allows us to move the parameterization of a test's execution step outside of the test body. pytest will be smart enough to treat each set of input parameters as a separate "virtual" test that will be run independently from other samples.

@pytest.mark.parametrize requires at least two positional arguments:

  • argnames: This is a list of argument names that pytest will use to provide test parameters to the test function as arguments. It can be a comma-separated string or a list/tuple of strings.
  • argvalues: This is an iterable of parameter sets for each individual test run. Usually, it is a list of lists or a tuple of tuples.

We could rewrite our last example to use the @pytest.mark.parametrize decorator as follows:

import pytest
@pytest.mark.parametrize(
    "batch_size, expected", [
        # even batches
        [1, [[1], [2], [3], [4], [5], [6]]],
        [2, [[1, 2], [3, 4], [5, 6]]],
        [3, [[1, 2, 3], [4, 5, 6]]],
        # batches with rest
        [4, [[1, 2, 3, 4], [5, 6]]]
    ]
)
def test_batch_parameterized(batch_size, expected):
    iterable = [1, 2, 3, 4, 5, 6]
    assert list(batches(iterable, batch_size)) == expected

If we now execute all the tests that we've written so far with the pytest -v command, we will get the following output:

======================== test session starts ========================
platform darwin -- Python 3.9.2, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- .../Expert-Python-Programming-Fourth-Edition/.venv/bin/python
cachedir: .pytest_cache
rootdir: .../Expert-Python-Programming-Fourth-Edition/Chapter 10/01 - Writing tests with pytest
collected 8 items
test_batch.py::test_batch_on_lists PASSED                     [ 12%]
test_batch.py::test_batch_with_loop PASSED                    [ 25%]
test_batch.py::test_batch_parameterized[1-expected0] PASSED   [ 37%]
test_batch.py::test_batch_parameterized[2-expected1] PASSED   [ 50%]
test_batch.py::test_batch_parameterized[3-expected2] PASSED   [ 62%]
test_batch.py::test_batch_parameterized[4-expected3] PASSED   [ 75%]
test_batch.py::test_batch_order PASSED                        [ 87%]
test_batch.py::test_batch_sizes PASSED                        [100%]
========================= 8 passed in 0.01s =========================

As you can see, the test report lists four separate instances of the test_batch_parameterized() test run. If any of those fails it won't affect the others.

Test parameterization effectively puts a part of classic test responsibility—the setup of the test context—outside of the test function. This allows for greater reusability of the test code and gives more focus on what really matters: unit execution and the verification of the execution outcome.

Another way of extracting the setup responsibility from the test body is through the use of reusable test fixtures. pytest already has great native support for reusable fixtures that is truly magical.

pytest's fixtures

The term "fixture" comes from mechanical and electronic engineering. It is a physical device that can take the form of a clamp or grip that holds the tested hardware in a fixed position and configuration (hence the name "fixture") to allow it to be consistently tested in a specific environment.

Software testing fixtures serve a similar purpose. They simulate a fixed environment configuration that tries to mimic the real usage of the tested software component. Fixtures can be anything from specific objects used as input arguments, through environment variable configurations, to sets of data stored in a remote database that are used during the testing procedure.

In pytest, a fixture is a reusable piece of setup and/or teardown code that can be provided as a dependency to the test functions. pytest has a built-in dependency injection mechanism that allows for writing modular and scalable test suites.

We've covered the topic of dependency injection for Python applications in Chapter 5, Interfaces, Patterns, and Modularity.

To create a pytest fixture you need to define a named function and decorate it with the @pytest.fixture decorator as in the following example:

import pytest
@pytest.fixture
def dependency():
    return "fixture value"

pytest runs fixture functions before test execution. The return value of the fixture function (here "fixture value") will be provided to the test function as an input argument. It is also possible to provide both setup and cleanup code in the same fixture function by using the following generator syntax:

@pytest.fixture
def dependency_as_generator():
    # setup code
    yield "fixture value"
    # teardown code

When generator syntax is used, pytest will obtain the yielded value of the fixture function and keep it suspended until the test finishes its execution. After the test finishes, pytest will resume execution of all used fixture functions just after the yield statement regardless of the test result (failure or success). This allows for the convenient and reliable cleanup of the test environment.

To use a fixture within a test you need to use its name as an input argument of the test function:

def test_fixture(dependency):
    pass

When starting a pytest runner, pytest will collect all fixture uses by inspecting the test function signatures and matching the names with available fixture functions. By default, there are a few ways that pytest will discover fixtures and perform their name resolution:

  • Local fixtures: Tests are able to use all the fixtures that are available from the same module that they are defined in. These can be fixtures that are imported in the same module. Local fixtures always take precedence over shared fixtures.
  • Shared fixtures: Tests are able to use fixtures available in the conftest module stored in the same directory as the test module or any of its parent directories. A test suite can have multiple conftest modules. Fixtures from conftest that are closer in the directory hierarchy take precedence over those that are further in the directory hierarchy. Shared fixtures always take precedence over plugin fixtures.
  • Plugin fixtures: pytest plugins can provide their own fixtures. These fixture names will be matched last.

Last but not least, fixtures can be associated with specific scopes that decide the lifetime of fixture values. These scopes are extremely important for fixtures implemented as generators because they determine when cleanup code is executed. There are five scopes available:

  • "function" scope: This is the default scope. A fixture function with the "function" scope will be executed once for every individual test run and will be destroyed afterward.
  • "class" scope: This scope can be used for test methods written in the xUnit style (based on the unittest module). Fixtures with this scope are destroyed after the last test in a test class.
  • "module" scope: Fixtures with this scope are destroyed after the last test in the test module.
  • "package" scope: Fixtures with this scope are destroyed after the last test in the given test package (collection of test modules).
  • "session" scope: This is kind of a global scope. Fixtures with this scope live though the entire runner execution and are destroyed after the last test.

Different scopes of fixtures can be used to optimize the test execution as a specific environment setup may sometimes take a substantial amount of time to execute. If many tests can safely reuse the same setup, it may be reasonable to expand the default "function" scope to "module", "package", or even "session".

Moreover, "session" fixtures can be used to perform global setup for the whole test run as well as a global cleanup. That's why they are often used with the autouse=True flag, which marks a fixture as an automatic dependency for a given group of tests. The scoping of autouse fixtures is as follows:

  • Module-level for the test module fixture: If a fixture with the autouse flag is included in the test module (a module with the test prefix), it will be automatically marked as a dependency of every test within that module.
  • Package-level for the test conftest module fixture: If a fixture with the autouse flag is included in a conftest module of a given test directory, it will be automatically marked as a dependency of every test in every test module within the same directory. This also includes subdirectories.

The best way to learn about using fixtures in various forms is by example. Our tests for the batch() function from the previous section were pretty simple and so didn't require the extensive use of fixtures. Fixtures are especially useful if you need to provide some complex object initialization or the setup state of external software components like remote services or databases. In Chapter 5, Interfaces, Patterns, and Modularity, we discussed examples of code for tracking page view counts with pluggable storage backends, and one of those examples used Redis as a storage implementation. Testing those backends would be a perfect use case for pytest fixtures, so let's recall the common interface of the ViewsStorageBackend abstract base class:

from abc import ABC, abstractmethod
from typing import Dict
class ViewsStorageBackend(ABC):
    @abstractmethod
    def increment(self, key: str): ...
    @abstractmethod
    def most_common(self, n: int) -> Dict[str, int]: ...

Abstract base classes or any other types of interface implementations, like Protocol subclasses, are actually great when it comes to testing. They allow you to focus on the class behavior instead of the implementation.

If we would like to test the behavior of any implementation of ViewsStorageBackend, we could test for a few things:

  • If we receive an empty storage backend, the most_common() method will return an empty dictionary
  • If we increment a number of page counts for various keys and request a number of most common keys greater or equal to the number of keys incremented, we will receive all tracked counts
  • If we increment a number of page counts for various keys and request a number of most common keys greater than or equal to the number of keys incremented, we will receive a shortened set of the most common elements

We will start with tests and then go over actual fixture implementation. The first test function for the empty storage backend will be really simple:

import pytest
import random
from interfaces import ViewsStorageBackend
@pytest.mark.parametrize(
    "n", [0] + random.sample(range(1, 101), 5)
)
def test_empty_backend(backend: ViewsStorageBackend, n: int):
    assert backend.most_common(n) == {}

This test doesn't require any elaborate setup. We could use a static set of n argument parameters, but additional parameterization with random values adds a nice touch to the test. The backend argument is a declaration of a fixture use that will be resolved by pytest during the test run.

The second test for obtaining a full set of increment counts will require more verbose setup and execution:

def test_increments_all(backend: ViewsStorageBackend):
    increments = {
        "key_a": random.randint(1, 10),
        "key_b": random.randint(1, 10),
        "key_c": random.randint(1, 10),
    }
    for key, count in increments.items():
        for _ in range(count):
            backend.increment(key)
    assert backend.most_common(len(increments)) == increments
    assert backend.most_common(len(increments) + 1) == increments

The test starts with the declaration of a literal dictionary variable with the intended increments. This simple setup serves two purposes: the increments variable guides the further execution step and also serves as validation data for two verification assertions. As in the previous test, we expect the backend argument to be provided by a pytest fixture.

The last test is quite similar to the previous one:

def test_increments_top(backend: ViewsStorageBackend):
    increments = {
        "key_a": random.randint(1, 10),
        "key_b": random.randint(1, 10),
        "key_c": random.randint(1, 10),
        "key_d": random.randint(1, 10),
    }
    for key, count in increments.items():
        for _ in range(count):
            backend.increment(key)
    assert len(backend.most_common(1)) == 1
    assert len(backend.most_common(2)) == 2
    assert len(backend.most_common(3)) == 3
    top2_values = backend.most_common(2).values()
    assert list(top2_values) == (
        sorted(increments.values(), reverse=True)[:2]
    )

The setup and execution steps are similar to the ones used in the test_increments_all() test function. If we weren't writing tests, we would probably consider moving those steps to separate reusable functions. But here it would probably have a negative impact on readability. Tests should be kept independent, so a bit of redundancy often doesn't hurt if it allows for clear and explicit tests. However, this is not a rule of course and always requires personal judgment.

Since all the tests are written down, it is time to provide a fixture. In Chapter 5, Interfaces, Patterns, and Modularity, we've included two implementations of backends: CounterBackend and RedisBackend. Ultimately, we would like to use the same set of tests for both storage backends. We will get to that eventually, but for now let's pretend that there's only one backend. It will simplify things a little bit.

Let's assume for now that we are testing only RedisBackend. It is definitely more complex than CounterBackend so we will have more fun doing that. We could write just one backend fixture but pytest allows us to have modular fixtures, so let's see how that works. We will start with the following:

from redis import Redis
from backends import RedisBackend
@pytest.fixture
def backend(redis_client: Redis):
    set_name = "test-page-counts"
    redis_client.delete(set_name)
    return RedisBackend(
        redis_client=redis_client,
        set_name=set_name
    )

redis_client.delete(set_name) removes the key in the Redis data store if it exists. We will use the same key in the RedisBackend initialization. The underlying Redis key that stores all of our increments will be created on the first storage modification, so we don't need to worry about non-existing keys. This way we ensure that every time a fixture is initialized, the storage backend is completely empty. The default fixture session scope is "function", and that means every test using that fixture will receive an empty backend.

Redis is not a part of most system distributions so you will probably have to install it on your own. Most Linux distributions have it available under the redis-server package name in their package repositories. You can also use Docker and Docker Compose. The following is a short docker-compose.yml file that will allow you to quickly start it locally:

version: "3.7"
services:
  redis:
    image: redis
    ports:
      - 6379:6379

You can find more details about using Docker and Docker Compose in Chapter 2, Modern Python Development Environments.

You may have noticed that we didn't instantiate the Redis client in the backend() fixture and instead specified it as an input argument of the fixture functions.

The dependency injection mechanism in pytest also covers fixture functions. This means you can request other fixtures inside of a fixture.

The following is example of a redis_client() fixture:

from redis import Redis
@pytest.fixture(scope="session")
def redis_client():
    return Redis(host='localhost', port=6379)

To avoid over-complicating things we have just hardcoded the values for the Redis host and port arguments. Thanks to the above modularity it will be easier to replace those values globally if you ever decide to use a remote address instead.

Save all the tests in the test_backends.py module, start the Redis server locally, and execute the pytest runner using the pytest -v command. You will get output that may look as follows:

======================= test session starts =========================
platform darwin -- Python 3.9.2, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- .../Expert-Python-Programming-Fourth-Edition/.venv/bin/python
cachedir: .pytest_cache
rootdir: .../Expert-Python-Programming-Fourth-Edition/Chapter 10/03 - Pytest's fixtures
collected 8 items
test_backends.py::test_empty_backend[0] PASSED                   [ 12%]
test_backends.py::test_empty_backend[610] PASSED                 [ 25%]
test_backends.py::test_empty_backend[611] PASSED                 [ 37%]
test_backends.py::test_empty_backend[7] PASSED                   [ 50%]
test_backends.py::test_empty_backend[13] PASSED                  [ 62%]
test_backends.py::test_empty_backend[60] PASSED                  [ 75%]
test_backends.py::test_increments_all PASSED                     [ 87%]
test_backends.py::test_increments_top PASSED                     [100%]
======================== 8 passed in 0.08s =========================

All tests passing means that we have succeeded in verifying the RedisBackend implementation. It would be great if we could do the same for CounterBackend. The most naïve thing to do would be to copy the tests and rewrite the test fixtures to now provide a new implementation of the backend. But this is a repetition that we would like to avoid.

We know that tests should be kept independent. Still, our three tests referenced only the ViewsStorageBackend abstract base class. So they should always be the same regardless of the actual implementation of the tested storage backends. What we have to do is to find a way to define a parameterized fixture that will allow us to repeat the same test over various backend implementations.

The parameterization of the fixture functions is a bit different than the parameterization of the test functions. The @pytest.fixture decorator accepts an optional params keyword value that accepts an iterable of fixture parameters. A fixture with the params keyword must also declare the use of a special built-in request fixture that, among other things, allows access to the current fixture parameter:

import pytest
@pytest.fixture(params=[param1, param2, ...])
def parmetrized_fixture(request: pytest.FixtureRequest):
    return request.param

We can use the parameterized fixture and the request.getfixturevalue() method to dynamically load a fixture depending on a fixture parameter. The revised and complete set of fixtures for our test functions can now look as follows:

import pytest
from redis import Redis
from backends import RedisBackend, CounterBackend
@pytest.fixture
def counter_backend():
    return CounterBackend()
@pytest.fixture(scope="session")
def redis_client():
    return Redis(host='localhost', port=6379)
@pytest.fixture
def redis_backend(redis_client: Redis):
    set_name = "test-page-counts"
    redis_client.delete(set_name)
    return RedisBackend(
        redis_client=redis_client,
        set_name=set_name
    )
@pytest.fixture(params=["redis_backend", "counter_backend"])
def backend(request):
    return request.getfixturevalue(request.param)

If you now run the same test suite with a new set of fixtures, you will see that the amount of executed tests just doubled. The following is some example output of the pytest -v command:

======================== test session starts ========================
platform darwin -- Python 3.9.2, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- .../Expert-Python-Programming-Fourth-Edition/.venv/bin/python
cachedir: .pytest_cache
rootdir: .../Expert-Python-Programming-Fourth-Edition/Chapter 10/03 - Pytest's fixtures
collected 16 items
test_backends.py::test_empty_backend[redis_backend-0] PASSED     [  6%]
test_backends.py::test_empty_backend[redis_backend-72] PASSED    [ 12%]
test_backends.py::test_empty_backend[redis_backend-23] PASSED    [ 18%]
test_backends.py::test_empty_backend[redis_backend-48] PASSED    [ 25%]
test_backends.py::test_empty_backend[redis_backend-780] PASSED   [ 31%]
test_backends.py::test_empty_backend[redis_backend-781] PASSED   [ 37%]
test_backends.py::test_empty_backend[counter_backend-0] PASSED   [ 43%]
test_backends.py::test_empty_backend[counter_backend-72] PASSED  [ 50%]
test_backends.py::test_empty_backend[counter_backend-23] PASSED  [ 56%]
test_backends.py::test_empty_backend[counter_backend-48] PASSED  [ 62%]
test_backends.py::test_empty_backend[counter_backend-780] PASSED [ 68%]
test_backends.py::test_empty_backend[counter_backend-781] PASSED [ 75%]
test_backends.py::test_increments_all[redis_backend] PASSED      [ 81%]
test_backends.py::test_increments_all[counter_backend] PASSED    [ 87%]
test_backends.py::test_increments_top[redis_backend] PASSED      [ 93%]
test_backends.py::test_increments_top[counter_backend] PASSED    [100%]
======================== 16 passed in 0.08s ========================

Thanks to the clever use of fixtures, we have reduced the amount of testing code without impacting the test readability. We could also reuse the same test functions to verify classes that should have the same behavior but different implementations. So, whenever requirements change, we can be sure that we will be able to recognize differences between classes of the same interface.

You need to be cautious when designing your fixtures as the overuse of dependency injection can make understanding the whole test suite harder. Fixture functions should be kept simple and well documented.

Using fixtures to provide connectivity for external services like Redis is convenient because the installation of Redis is pretty simple and does not require any custom configuration to use it for testing purposes. But sometimes your code will be using a remote service or resource that you cannot easily provide in the testing environment or cannot perform tests against without making destructive changes. This can be pretty common when working with third-party web APIs, hardware, or closed libraries/binaries, for instance. In such cases, a common technique is to use fake objects or mocks that can substitute for real objects. We will discuss this technique in the next section.

Using fakes

Writing unit tests presupposes that you can isolate the unit of code that is being tested. Tests usually feed the function or method with some data and verify its return value and/or the side effects of its execution. This is mainly to make sure that:

  • Tests are concerned with an atomic part of the application, which can be a function, method, class, or interface
  • Tests provide deterministic, reproducible results

Sometimes, the proper isolation of the program component is not obvious or easily done. In the previous section we discussed the example of a testing suite that, among other things, was verifying a piece of code that interacted with a Redis data store. We provided the connectivity to Redis using a pytest fixture and we saw that it wasn't that hard. But did we test only our code, or did we test also the behavior of Redis?

In this particular case, including connectivity to Redis was a pragmatic choice. Our code did only a bit of work and left most of the heavy lifting to the external storage engine. It couldn't work properly if Redis didn't work properly. In order to test the whole solution, we had to test the integration of our code and the Redis data store. Tests like that are often called integration tests and are commonly used in testing software that heavily relies on external components.

But safe integration tests are not always possible. Not every service you will use will be as easy to start locally as Redis. Sometimes you will be dealing with those "special" components that cannot be replicated outside of ordinary production use.

In such cases, you will have to substitute the dependency with a fake object that simulates a real-life component.

To better understand the typical use cases for using fakes in tests, let's consider the following imaginary story: We are building a scalable application that provides our customers with the ability to track page counts on their sites in real time. Unlike our competitors, we offer a highly available and scalable solution with very low latency and the ability to run with consistent results in many datacenters across the globe. The cornerstone of our product is a small counter class from the backends.py module.

Having a highly available distributed hash map (the data type we used in Redis) that would ensure low latency in a multi-region setup isn't something trivial. Surely one Redis instance won't do what we advertise to our customers. Thankfully, a cloud computing vendor—ACME Corp—reached out to us recently, offering one of their latest beta products. It is called ACME Global HashMap Service and it does exactly what we want. But there's a catch: it is still in beta, and thus ACME Corp by their policy does not provide a sandbox environment that we can use for testing purposes yet. Also, for some unclear legal reasons, we can't use the production service endpoint in our automated testing pipelines.

So, what could we do? Our code grows every day. The planned AcmeStorageBackend class will likely have the additional code that handles logging, telemetry, access control, and a lot of other fancy stuff. We definitely want to be able to test it thoroughly. Therefore, we've decided to use a fake substitute of the ACME Corp SDK that we were supposed to integrate into our product.

The ACME Corp Python SDK comes in the form of the acme_sdk package. Among other things, it includes the following two interfaces:

from typing import Dict
class AcmeSession:
    def __init__(self, tenant: str, token: str): ...
class AcmeHashMap:
    def __init__(self, acme_session: AcmeSession): ...
    def incr(self, key: str, amount):
        """Increments any key by specific amount"""
        ...
    def atomic_incr(self, key: str, amount):
        """Increments any key by specific amount atomically"""
        ...
    def top_keys(self, count: int) -> Dict[str, int]:
        """Returns keys with top values"""
        ...

The AcmeSession session is an object that encapsulates the connection to ACME Corp services, and AcmeHashMap is the service client we want to use. We will most likely use the atomic_incr() method to increment page view counts. top_keys() will provide us with the ability to obtain the most common pages.

To build a fake, we simply have to define a new class that has an interface that is compatible with our use of AcmeHashMap. We can take the pragmatic approach and implement only those classes and methods that we plan to use. The minimal implementation of AcmeHashMapFake could be as follows:

from collections import Counter
from typing import Dict
class AcmeHashMapFake:
    def __init__(self):
        self._counter = Counter()
    def atomic_incr(self, key: str, amount):
        self._counter[key] += amount
    def top_keys(self, count: int) -> Dict[str, int]:
        return dict(self._counter.most_common(count))

We can use AcmeHashMapFake to provide a new fixture in the existing test suite for our storage backends. Let's assume that we have an AcmeBackend class in the backends module that takes the AcmeHashMapFake instance as its only input argument. We could then provide the following two pytest fixture functions:

from backends import AcmeBackend
from acme_fakes import AcmeHashMapFake
@pytest.fixture
def acme_client():
    return AcmeHashMapFake()
@pytest.fixture
def acme_backend(acme_client):
    return AcmeBackend(acme_client)

Splitting the setup into two fixtures prepares us for what may come in the near future. When we finally get our hands on the ACME Corp sandbox environment, we will have to modify only one fixture:

from acme_sdk import AcmeHashMap, AcmeSession
@pytest.fixture
def acme_client():
    return AcmeHashMap(AcmeSession(..., ...))

To summarize, fakes provide the equivalent behavior for an object that we can't build during a test or that we simply don't want to build. This is especially useful for situations where you have to communicate with external services or access remote resources. By internalizing those resources, you gain better control of the testing environment and thus are able to better isolate the tested unit of code.

Building custom fakes can become a tedious task if you have to build many of them. Fortunately, the Python library comes with the unittest.mock module, which can be used to automate the creation of fake objects.

Mocks and the unittest.mock module

Mock objects are generic fake objects that can be used to isolate the tested code. They automate the building process of the fake object's input and output. There is a greater level of use of mock objects in statically typed languages, where monkey patching is harder, but they are still useful in Python to shorten the code that mimics external APIs.

There are a lot of mock libraries available in Python, but the most recognized one is unittest.mock, which is provided in the standard library.

unittest.mock was initially created as a third-party mock package available on PyPI. After some time, it was included in the standard library as a provisional package. To learn more about provisional standard library packages, visit https://docs.python.org/dev/glossary.html#term-provisional-api.

Mocks can almost always be used in place of custom fake objects. They are especially useful for faking external components and resources that we don't have full control over during the test. They are also an indispensable utility when we have to go against the prime TDD principle—that is, when we have to write tests after the implementation has been written.

We already discussed the example of faking the connectivity layer to the external resource in the previous section. Now we will take a closer look at a situation when we have to write a test for an already existing piece of code that doesn't have any tests yet.

Let's say we have the following send() function that is supposed to send email messages over the SMTP protocol:

import smtplib
import email.message
def send(
    sender, to,
    subject='None',
    body='None',
    server='localhost'
):
    """sends a message."""
    message = email.message.Message()
    message['To'] = to
    message['From'] = sender
    message['Subject'] = subject
    message.set_payload(body)
    client = smtplib.SMTP(server)
    try:
        return client.sendmail(sender, to, message.as_string())
    finally:
        client.quit()

It definitely doesn't help that the function creates its own smtplib.SMTP instance, which clearly represents an SMTP client connection. If we started with tests first, we would probably have thought of it in advance and utilized a minor inversion of control to provide the SMTP client as a function argument. But the damage is done. The send() function is used across our whole codebase and we don't want to start refactoring yet. We need to test it first.

The send() function is stored in a mailer module. We will start with a black-box approach and assume it doesn't need any setup. We create a test that naively tries to call the function and hope for success. Our first iteration will be as follows:

from mailer import send
def test_send():
    res = send(
        '[email protected]',
        '[email protected]',
        'topic',
        'body'
    )
    assert res == {}

Unless you have an SMTP server running locally you will see the following output when running pytest:

$ py.test -v --tb line
======================= test session starts =========================
platform darwin -- Python 3.9.2, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- .../Expert-Python-Programming-Fourth-Edition/.venv/bin/python
cachedir: .pytest_cache
pytest-mutagen-1.3 : Mutations disabled
rootdir: .../Expert-Python-Programming-Fourth-Edition/Chapter 10/05 - Mocks and unittest.mock module
plugins: mutagen-1.3
collected 1 item
test_mailer.py::test_send FAILED                                [100%]
============================ FAILURES ===============================
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/socket.py:831: ConnectionRefusedError: [Errno 61] Connection refused
======================= short test summary info =======================
FAILED test_mailer.py::test_send - ConnectionRefusedError: [Errno 61...
======================== 1 failed in 0.05s ==========================

The --tb parameter of the py.test command can be used to control the length of the traceback output on test failures. Here we used --tb line to receive one-line tracebacks. Other values are auto, long, short, native, and no.

There go our hopes. The send function failed with a ConnectionRefusedError exception. If we don't want to run the SMTP server locally or send real messages by connecting to a real SMTP server, we will have to find a way to substitute the smtplib.SMTP implementation with a fake object.

In order to achieve our goal, we will use two techniques:

  • Monkey patching: We will modify the smtplib module on the fly during the test run in order to trick the send() function into using a fake object in place of the smtplib.SMTP class.
  • Object mocking: We will create a universal mock object that can act as a fake for absolutely any object. We will do that just to streamline our work.

Before we explain both techniques in more detail, let's take a look at an example test function:

from unittest.mock import patch
from mailer import send
def test_send():
    sender = "[email protected]"
    to = "[email protected]"
    body = "Hello jane!"
    subject = "How are you?"
    with patch('smtplib.SMTP') as mock:
        client = mock.return_value
        client.sendmail.return_value = {}
        res = send(sender, to, subject, body)
        assert client.sendmail.called
        assert client.sendmail.call_args[0][0] == sender
        assert client.sendmail.call_args[0][1] == to
        assert subject in client.sendmail.call_args[0][2]
        assert body in client.sendmail.call_args[0][2]
        assert res == {}

The unittest.mock.path context manager creates a new unittest.mock.Mock class instance and substitutes it under a specific import path. When the send() function will try to access the smtplib.SMTP attribute, it will receive the mock instance instead of the SMTP class object.

Mocks are quite magical. If you try to access any attribute of a mock outside of the set of names reserved by the unittest.mock module, it will return a new mock instance. Mocks can also be used as functions and, when called, also return a new mock instance.

The send() function expects stmptlib.SMTP to be a type object, so it will use the SMTP() call to obtain an instance of the SMTP client object. We use mock.return_value (return_value is one of the reserved names) to obtain the mock of that client object and control the return value of the client.sendmail() method.

After the execution of the send() function, we used a couple of other reserved names (called and call_args) to verify whether the client.sendmail() method was called and to inspect the call arguments.

Note that what we did here probably isn't a good idea, as we just retraced what the send() function's implementation does. You should avoid doing so in your own tests because there's no purpose in a test that just paraphrases the implementation of the tested function. Regardless, this was more to present the abilities of the unittest.mock module than to show how tests should be written.

The patch() context manager from the unittest.mock module is one way of dynamically monkey patching import paths during the test run. It can be also used as a decorator. It is quite an intricate feature, so it is not always easy to patch what you want. Also, if you want to patch several objects at once it will require a bit of nesting, and that may be quite inconvenient.

pytest comes with an alternative way to perform monkey patching. It comes with a built-in monkeypatch fixture that acts as a patching proxy. If we would like to rewrite the previous example with the use of the monkeypatch fixture, we could do the following:

import smtplib
from unittest.mock import Mock
from mailer import send
def test_send(monkeypatch):
    sender = "[email protected]"
    to = "[email protected]"
    body = "Hello jane!"
    subject = "How are you?"
    smtp = Mock()
    monkeypatch.setattr(smtplib, "SMTP", smtp)
    client = smtp.return_value
    client.sendmail.return_value = {}
    res = send(sender, to, subject, body)
    assert client.sendmail.called
    assert client.sendmail.call_args[0][0] == sender
    assert client.sendmail.call_args[0][1] == to
    assert subject in client.sendmail.call_args[0][2]
    assert body in client.sendmail.call_args[0][2]
    assert res == {}

The monkey patching and mocks can be easily abused. That happens especially often when writing tests after the implementation. That's why mocks and monkey patching should be avoided if there are other ways of reliably testing software. Otherwise, you may end up with a project that has a lot of tests that are just empty shells and do not really verify software correctness. And there is nothing more dangerous than a false sense of security. There is also a risk that your mocks will assume behavior that is different from reality.

In the next section we will discuss the topic of quality automation, which dovetails with TDD.

Quality automation

There is no arbitrary scale that can say definitely if some code's quality is bad or good. Unfortunately, the abstract concept of code quality cannot be measured and expressed in the form of numbers. Instead, we can measure various metrics of the software that are known to be highly correlated with the quality of code. The following are a few:

  • The percentage of code covered by tests
  • The number of code style violations
  • The amount of documentation
  • Complexity metrics, such as McCabe's cyclomatic complexity
  • The number of static code analysis warnings

Many projects use code quality testing in their continuous integration workflows. A good and popular approach is to test at least the basic metrics (test coverage, static code analysis, and code style violations) and not allow the merging of any code to the main branch that scores poorly on these metrics.

In the following sections, we will discuss some interesting tools and methods that will allow you to automate the evaluation of select code quality metrics.

Test coverage

Test coverage, also known as code coverage, is a very useful metric that provides objective information on how well some given source code is tested. It is simply a measurement of how many and which lines of code are executed during the test run. It is often expressed as a percentage, and 100% coverage means that every line of code was executed during tests.

The most popular code coverage tool for measuring Python code is the coverage package, and it is freely available on PyPI. Its usage is very simple and consists only of two steps:

  1. Running the test suite using the coverage tool
  2. Reporting the coverage report in the desired format

The first step is to execute the coverage run command in your shell with the path to your script/program that runs all the tests. For pytest, it could be something like this:

$ coverage run $(which pytest)

The which command is a useful shell utility that returns in standard output the path to the executable of the other command. The $() expression can be used in many shells as a subexpression to substitute the command output into the given shell statement as a value.

Another way to invoke coverage run is using the -m flag, which specifies the runnable module. This is similar to invoking runnable modules with python -m. Both the pytest package and the unittest module provide their test runners as runnable modules:

$ python -m pytest
$ python -m unittest

So, in order to run test suites under the supervision of the coverage tool, you can use the following shell commands:

$ coverage run -m pytest
$ coverage run -m unittest

By default, the coverage tool will measure the test coverage of every module imported during the test run. It may thus also include external packages installed in a virtual environment for your project. You usually want to only measure the test coverage of the source code of your own project and exclude external sources. The coverage command accepts the --source parameter, which allows you to restrict the measurement to specific paths as in the following example:

$ coverage run --source . -m pytest

The coverage tool allows you to specify some configuration flags in the setup.cfg file. The example contents of setup.cfg for the above coverage run invocation would be as follows:

[coverage:run]
source =
    .

During the test run, the coverage tool will create a .coverage file with the intermediate results of the coverage measurement. After the run you can review the results by issuing the coverage report command.

To see how coverage measurement works in practice, let's say that we decided to do an ad hoc extension of one of the classes mentioned in the pytest's fixtures section but we didn't bother to test it properly. We would add the count_keys() method to CounterClass as in the following example:

class CounterBackend(ViewsStorageBackend):
    def __init__(self):
        self._counter = Counter()
    def increment(self, key: str):
        self._counter[key] += 1
    def most_common(self, n: int) -> Dict[str, int]:
        return dict(self._counter.most_common(n))
    def count_keys(self):
        return len(self._counter)

This count_keys() method hasn't been included in our interface declaration (the ViewsStorageBackend abstract base class), so we didn't anticipate its existence when writing our test suite.

Let's now perform a quick test run using the coverage tool and review the overall results. This is the example shell transcript showing what we could potentially see:

$ coverage run –source . -m pytest -q
................                                                                                                                                                                                                                  [100%]
16 passed in 0.12s
$ coverage report -m
Name               Stmts   Miss  Cover   Missing
------------------------------------------------
backends.py           21      1    95%   19
interfaces.py          7      0   100%
test_backends.py      39      0   100%
------------------------------------------------
TOTAL                 67      1    99%

All parameters and flags after the -m <module> parameter in the coverage run command will be passed directly to the runnable module invocation. Here the -q flag is a pytest runner flag saying that we want to obtain a short (quiet) report of the test run.

As we can see, all the tests have passed but the coverage report showed that the backends.py module is 95% covered by tests. This means that 5% of lines haven't been executed during the test run. This highlights that there is a gap in our test suite.

The Missing column (thanks to the -m flag of the coverage report command) shows the numbers of lines that were missed during the test run. For small modules with high coverage, it is enough just to locate missing coverage gaps. When coverage is very low, you will probably want a more in-depth report.

The coverage tool comes with a coverage html command that will generate an interactive coverage report in HTML format:

Obraz zawierający tekst

Opis wygenerowany automatycznie

Figure 10.1: Example HTML coverage report highlighting coverage gaps

Test coverage is a very good metric that has high correlation with overall quality of code. Projects with low test coverage will statistically have more quality problems and defects. Projects with high coverage will usually have fewer defects and quality problems, assuming that tests are written according to the good practices highlighted in the principles of test-driven development section.

Even projects with 100% coverage can behave unpredictably and be riddled with notorious bugs. In such situations, it may be necessary to use techniques that could validate the usefulness of existing test suites and uncover missed testing conditions. One such technique is mutation testing, discussed in the Mutation testing section.

Still, it is very easy to write meaningless tests that greatly increase test coverage. Always review the test coverage results of new projects with great care and don't treat the results as a definitive statement of project code quality.

Also, software quality is not only about how precisely software is tested but also about how easy it is to read, maintain, and extend. So, it is also about code style, common conventions, code reuse, and safety. Thankfully, measurement and validation of those programming areas can be automated to some extent.

In this section we have used the coverage tool in a "classic" way. If you use pytest you can streamline the coverage measurement using the pytest-cov plugin, which can automatically add a coverage run to every test run. You can read more about pytest-cov at https://github.com/pytest-dev/pytest-cov.

Let's start with code style automation and linters, as those are the most common examples of quality automation used by professional Python programmers.

Style fixers and code linters

Code is simply harder to read than it is to write. That's true regardless of the programming language. It is less likely for a piece of software to be high quality if it is written inconsistently or with bizarre formatting or coding conventions. This is not only because it will be hard to read and understand but also because it will be hard to extend and maintain at a constant pace, and software quality concerns both the present state of code and its possible future.

To increase consistency across the codebase, programmers use tools that can verify the code style and various coding conventions. These tools are known as linters. Some such tools can also search for seemingly harmless but potentially problematic constructs, which include the following:

  • Unused variables or import statements
  • Access to protected attributes from outside of the hosting class
  • Redefining existing functions
  • Unsafe use of global variables
  • Invalid order of except clauses
  • Raising bad (non-exception) types

Although linter is an umbrella term used for any tool that flags style errors/inconsistencies as well as suspicious and potentially dangerous code constructs, in recent years we have seen the ongoing specialization of linters. There are two main groups of linters commonly found in the Python community:

  • Style fixers: These are linters that focus on coding conventions and enforce specific styling guidelines. For Python, this can be PEP 8 guidelines or any arbitrary coding conventions. Style fixers are able to find styling errors and apply fixes automatically. Examples of some popular Python style fixers are black, autopep8, and yapf. There are also highly specialized style fixers that focus only on one aspect of code style. A prime example is isort, which focuses on sorting import statements.
  • Classic linters: These are linters that focus more on suspicious/dangerous constructs that may lead to bugs and/or undefined behaviors, although they can also include rulesets for specific style conventions. Classic linters usually work in a complain-only mode. They are able to flag issues but aren't able to fix those issues automatically. Examples of popular classic Python linters are pylint and pyflakes. A common style-only classic linter is pycodestyle.

There are also some experimental hybrid linters focusing on auto-fixing suspicious code constructs, like autoflake. Unfortunately, due to the delicate nature of those suspicious constructs (like unused import statements or variables), it is not always possible to perform a safe fix without introducing side effects. Those fixers should be used with great care.

Both style fixers and classic linters are indispensable when writing high-quality software, especially for professional use. Popular classic linters like pyflakes and pylint have a plethora of rules for errors, warnings, and automatic recommendations, and the list of rules is ever-expanding.

A large collection of rules means that introducing one of these linters to a preexisting large project usually requires some tuning to match common coding conventions. You may find some of these rules quite arbitrary (like default line lengths, specific import patterns, or maximum function argument numbers) and decide to silence some of the default checks. That requires a bit of effort, but it is an effort that really pays off in the long term.

Anyway, configuring linters is quite a tedious task, so we won't dive any deeper into that. Both pylint and pyflakes have great documentation that describes their use clearly. Definitely more exciting than classic linters are style fixers. They usually require very little configuration or no configuration at all.

They can bring a lot of consistency to an existing codebase with just a single command execution. We will see how it works on the examples of the code bundle for this book.

All examples in this book are written usually according to the PEP 8 code style. Still being constrained by the medium, we had to make a few tweaks just to make sure the code samples are clear, concise, and read nicely on paper. These tweaks were as follows:

  • Using a single empty new line instead of two whenever possible: There are places where PEP 8 recommends two (mostly for the separation of functions, methods, and classes). We decided to use one just to save space and avoid breaking longer code examples over two pages.
  • Using a lower characters-per-line amount: The PEP 8 line width limit is 79 characters. It's not a lot, but it turns out that it's still too much for a book. Books typically have a portrait format and the standard 79-80 characters in monotype font would usually not fit a single line in print. Also, some readers may use ebook readers where the author has no control over the display of code samples. Using shorter lines makes it more likely that examples on paper and ebook readers will look the same.
  • Not grouping imports into sections: PEP 8 suggests grouping imports for standard library modules, third-party modules, and local modules and separating them with a single newline. That makes sense for code modules but in a book format, where we rarely use more than two imports per example, it would only introduce noise and waste precious page space.

These small deviations from the PEP 8 guideline are definitely justified for the book format. But the same code samples are also available in the Git repository dedicated to the book. If you were to open those samples in your favorite IDE just to see the code unnaturally compacted to the book format, you would probably feel a bit uneasy. It was thus necessary to include the code bundle in the format that is best for computer display.

The book code bundle includes over 100 Python source files, and writing them in two styles independently would be error-prone and cost us a lot of time. So, what did we do instead? We worked on the book samples in the Git repository using the informal book format. Every chapter was reviewed by multiple editors, so some examples had to be updated a few times. When we knew that everything was correct and working as expected, we simply used the black tool to discover all style violations and automatically apply fixes.

The usage of the black tool is pretty straightforward. You invoke it with the black <sources> command, where <sources> is a path to the source file or directory containing the source files you want to reformat. To reformat all source files in the current working directory, you can use:

$ black .

When you run black over the code bundle for this book, you will see output like the following:

(...)
reformatted /Users/swistakm/dev/Expert-Python-Programming-Fourth-Edition/Chapter 8/01 - One step deeper: class decorators/autorepr.py
reformatted /Users/swistakm/dev/Expert-Python-Programming-Fourth-Edition/Chapter 6/07 - Throttling/throttling.py
reformatted /Users/swistakm/dev/Expert-Python-Programming-Fourth-Edition/Chapter 8/01 - One step deeper: class decorators/autorepr_subclassed.py
reformatted /Users/swistakm/dev/Expert-Python-Programming-Fourth-Edition/Chapter 7/04 - Subject-based style/observers.py
reformatted /Users/swistakm/dev/Expert-Python-Programming-Fourth-Edition/Chapter 8/04 - Using __init__subclass__ method as alternative to metaclasses/autorepr.py
reformatted /Users/swistakm/dev/Expert-Python-Programming-Fourth-Edition/Chapter 9/06 - Calling C functions using ctypes/qsort.py
reformatted /Users/swistakm/dev/Expert-Python-Programming-Fourth-Edition/Chapter 8/04 - Using __init__subclass__ method as alternative to metaclasses/autorepr_with_init_subclass.py
All done! 
64 files reformatted, 37 files left unchanged.

The actual output of the black command was a few times longer and we had to truncate it substantially.

What would otherwise have cost us many hours of mundane work, thanks to the black tool, was done in just a few seconds.

Of course, you can't rely on every developer in the project consistently running black for every change committed to the central code repository. That's why the black tool can be run in check-only mode using the --check flag. Thanks to this, black can also be used as a style verification step in shared build systems, providing the continuous integration of changes.

Tools like black definitely increase the quality of code by ensuring effortless and consistent formatting of code. Thanks to this, code will be easier to read and (hopefully) easier to understand. Another advantage is that it saves a lot of time that otherwise would be wasted on countless code formatting debates. But that's only a small aspect of the quality spectrum. There's no guarantee that consistently formatted code will have fewer bugs or will be visibly more maintainable.

When it comes to defect discovery and maintainability, classic linters are generally much better than any automated fixer. One subgroup of classic linters is especially great at finding potentially problematic parts of code. These are linters that are able to perform static type analysis. We will take a closer look at one of them in the next section.

Static type analysis

Python isn't statically typed but has voluntary type annotations. This single feature, with the help of highly specialized linters, can make Python code almost as type-safe as classical statically typed languages.

The voluntary nature of typing annotations has also another advantage. You can decide whether to use type annotations or not at any time. Typed arguments, variable functions, and methods are great for consistency and avoiding silly mistakes but can get in your way when you attempt to do something unusual. Sometimes you just need to add an extra attribute to a preexisting object instance using monkey patching, or hack through a third-party library that doesn't seem to want to cooperate. It is naturally easier to do so if the language doesn't enforce type checks.

The leading type of static checker for Python is currently mypy. It analyzes functions and variable annotations that can be defined using a type hinting hierarchy from the typing module. In order to work, it does not require you to annotate the whole code with types. This characteristic of mypy is especially great when maintaining legacy codebases as typing annotations can be introduced gradually.

You can learn more about the typing hierarchy of Python by reading the PEP 484 -- Type Hints document, available at https://www.python.org/dev/peps/pep-0484/.

Like with other linters, the basic usage of mypy is pretty straightforward. Simply write your code (using type annotations or not) and verify its correctness with one mypy <path> command, where <path> is a source file or a directory containing multiple source files. mypy will recognize parts of the code that feature type annotations and verify that the usage of functions and variables matches the declared types.

Although mypy is an independent package available on PyPI, type hinting for the purpose of static analysis is fully supported by mainstream Python development in the form of a Typeshed project. Typeshed (available at https://github.com/python/typeshed) is a collection of library stubs with static type definitions for both the standard library and many popular third-party projects.

You'll find more information about mypy and its command-line usage on the official project page at http://mypy-lang.org.

So far, we have discussed the topic of quality automation with regard to application code. We used tests as a tool to increase overall software quality and measured test coverage to get some understanding of how well tests have been written. What we haven't spoken about yet is the quality of tests, and this is as important as the quality of the code that is being tested. Bad tests can give you a false sense of security and software quality. This can be almost as harmful as a lack of any tests at all.

Generally, the basic quality automation tools can be applied to test code as well. This means that linters and style fixers can be used to maintain the test codebase. But those tools do not give us any quantitative measurements of how well tests can detect new and existing bugs. Measuring the effectiveness and quality of tests requires slightly different techniques. One of those techniques is mutation testing. Let's learn what that is.

Mutation testing

Having 100% test coverage in your project is indeed a satisfying thing. But the higher it is, the quicker you will learn that it is never a guarantee of bullet-proof software. Countless projects with high coverage discover new bugs in parts of the code that are already covered by tests. How does that happen?

Reasons for that vary. Sometimes requirements aren't clear, and tests do not cover what they were supposed to cover. Sometimes tests include errors. In the end, tests are just code and like any other code are susceptible to bugs.

But sometimes bad tests are just empty shells—they execute some units of code and compare some results but don't actually care about really verifying software correctness. And amazingly, it is easier to fall into this trap if you really care about quality and measure the test coverage. Those empty shells are often tests written in the last stage just to achieve perfect coverage.

One of the ways to verify the quality of tests is to deliberately modify the code in a way that we know would definitely break the software and see if tests can discover the issue. If at least one test fails, we are sure that they are good enough to capture that particular error. If none of them fails, we may need to consider revisiting the test suite.

As possibilities for errors are countless, it is hard to perform this procedure often and repeatedly without the aid of tools and specific methodologies. One such methodology is mutation testing.

Mutation testing works on the hypothesis that most faults of software are introduced by small errors like off-by-one errors, flipping comparison operators, wrong ranges, and so on. There is also the assumption that these small mistakes cascade into larger faults that should be recognizable by tests.

Mutation testing uses well-defined modification operators known as mutations that simulate small and typical programmer mistakes. Examples of those can be:

  • Replacing the == operator with the is operator
  • Replacing a 0 literal with 1
  • Switching the operands of the < operator
  • Adding a suffix to a string literal
  • Replacing a break statement with continue

In each round of mutation testing, the original program is modified slightly to produce a so-called mutant. If the mutant can pass all the tests, we say that it survived the test. If at least one of the tests failed, we say that it was killed during the test. The purpose of mutation testing is to strengthen the test suite so that it does not allow new mutants to survive.

All of this theory may sound a little bit vague at this point, so we will now take a look at a practical example of a mutation testing session. We will try to test an is_prime() function that is supposed to verify whether an integer number is a prime number or not.

A prime number is a natural number greater than 1 that is divisible only by itself and 1. We don't want to repeat ourselves, so there is no easy way to test the is_prime() function other than providing some sample data. We will start with the following simple test:

from primes import is_prime
def test_primes_true():
    assert is_prime(5)
    assert is_prime(7)
def test_primes_false():
    assert not is_prime(4)
    assert not is_prime(8)

We could use some parameterization, but let's leave that for later. Let's save that in the test_primes.py file and move to the is_prime() function. What we care about right now is simplicity, so we will create a very naïve implementation as follows:

def is_prime(number):
    if not isinstance(number, int) or number < 0:
        return False
    if number in (0, 1):
        return False
    for element in range(2, number):
        if number % element == 0:
            return False
    return True

It may not be the most performant implementation, but it's dead simple and so should be easy to understand. Only integers greater than 1 can be prime. We start by checking for type and against the values 0 and 1. For other numbers, we iterate over integers smaller than number and greater than 1. If number is not divisible by any of those integers it means it is a prime. Let's save that function in the primes.py file.

Now it's time to evaluate the quality of our tests. There are a few mutation testing tools available on PyPI. One that seems the simplest to use is mutmut, and we will use it in our mutation testing session. mutmut requires you to define a minor configuration that tells it how tests are run and how to mutate your code. It uses its own [mutmut] section in the common setup.cfg file. Our configuration will be the following:

[mutmut]
paths_to_mutate=primes.py
runner=python -m pytest -x

The paths_to_mutate variable specifies paths of the source files that mutmut is able to mutate. Mutation testing in large projects can take a substantial amount of time so it is crucial to guide mutmut on what it is supposed to mutate, just to save time.

The runner variable specifies the command that is used to run tests. mutmut is framework agnostic so it supports any type of test framework that has a runner executable as a shell command. Here we use pytest with the -x flag. This flag tells pytest to abort testing on the first failure. Mutation testing is all about discovering surviving mutants. If any of the tests fail, we will already know that the mutant hasn't survived.

Now it's time to start the mutation testing session. The mutmut tool's usage is very similar to that of the coverage tool, so our work starts with the run subcommand:

$ mutmut run

The whole run will take a couple of seconds. After mutmut finishes validation of the mutants, we will see the following summary of the run:

- Mutation testing starting -
These are the steps:
1. A full test suite run will be made to make sure we
   can run the tests successfully and we know how long
   it takes (to detect infinite loops for example)
2. Mutants will be generated and checked
Results are stored in .mutmut-cache.
Print found mutants with `mutmut results`.
Legend for output:
 Killed mutants.   The goal is for everything to end up in this bucket.
 Timeout.          Test suite took 10 times as long as the baseline so were killed.
 Suspicious.       Tests took a long time, but not long enough to be fatal.
 Survived.         This means your tests needs to be expanded.
 Skipped.          Skipped.
1. Running tests without mutations
 Running...Done
2. Checking mutants
 15/15   8   0   0   7   0

The last line shows a short summary of the results. We can get a detailed view by running the mutmut results command. We got the following output in our session:

$ mutmut results
To apply a mutant on disk:
    mutmut apply <id>
To show a mutant:
    mutmut show <id>
Survived  (7)
---- primes.py (7) ----
8-10, 12-15

The last line shows the identifiers of the mutants that survived the test. We can see that 7 mutants survived, and their identifiers are in the 8-10 and 12-15 ranges. The output also shows useful information on how to review mutants using the mutmut show <id> command. You can also review mutants in bulk using the source file name as the <id> value.

We're doing this only for illustration purposes, so we will review only two mutants. Let's take a look at the first one with an ID of 8:

$ mutmut show 8
--- primes.py
+++ primes.py
@@ -2,7 +2,7 @@
     if not isinstance(number, int) or number < 0:
         return False
-    if number in (0, 1):
+    if number in (1, 1):
         return False
     for element in range(2, number):

mutmut has modified the range values of our if number in (...) and our tests clearly didn't catch the issue. This means that we probably have to include those values in our testing conditions.

Let's now take a look at the last mutant with an ID of 15:

$ mutmut show 15
--- primes.py
+++ primes.py
@@ -1,6 +1,6 @@
 def is_prime(number):
     if not isinstance(number, int) or number < 0:
-        return False
+        return True
     if number in (0, 1):
         return False

mutmut has flipped the value of the bool literal after the type and value range checks. The mutant survived because we included a type check but didn't test what happens when the input value has the wrong type.

In our case all those mutants could have been killed if we included more test samples in our tests. If we extend the test suite to cover more corner cases and invalid values, it would probably make it more robust. The following is a revised set of tests:

from primes import is_prime
def test_primes_true():
    assert is_prime(2)
    assert is_prime(5)
    assert is_prime(7)
def test_primes_false():
    assert not is_prime(-200)
    assert not is_prime(3.1)
    assert not is_prime(0)
    assert not is_prime(1)
    assert not is_prime(4)
    assert not is_prime(8)

Mutation testing is a hybrid methodology because it not only verifies the testing quality but also can highlight potentially redundant code. For instance, if we implement the test improvements from the above example, we will still see two surviving mutants:

# mutant 12
--- primes.py
+++ primes.py
@@ -1,5 +1,5 @@
 def is_prime(number):
-    if not isinstance(number, int) or number < 0:
+    if not isinstance(number, int) or number <= 0:
         return False
     if number in (0, 1):
# mutant 13
--- primes.py
+++ primes.py
@@ -1,5 +1,5 @@
 def is_prime(number):
-    if not isinstance(number, int) or number < 0:
+    if not isinstance(number, int) or number < 1:
         return False
     if number in (0, 1):

Those two mutants survive because the two if clauses we used can potentially handle the same condition. It means that the code we wrote is probably overly complex and can be simplified. We will be able to kill those two outstanding mutants if we collapse two if statements into one:

def is_prime(number):
    if not isinstance(number, int) or number <= 1:
        return False
    for element in range(2, number):
        if number % element == 0:
            return False
    return True

Mutation testing is a really interesting technique that can strengthen the quality of tests. One problem is that it doesn't scale well. On larger projects, the number of potential mutants will be really big and in order to validate them, you have to run the whole test suite. It will take a lot of time to execute a single mutation session if you have many long-running tests. That's why mutation testing works well with simple unit tests but is very limited when it comes to integration testing. Still, it is a great tool for poking holes in those perfect coverage test suites.

In the past few sections, we have been focusing on systematic tools and approaches for writing tests and quality automation. These systematic approaches create a good foundation for your testing operations but do not guarantee that you will be efficient in writing tests or that testing will be easy. Testing sometimes can be tedious and boring. What makes it more fun is the large collection of utilities available on PyPI that allows you to reduce the boring parts.

Useful testing utilities

When it comes to efficiency in writing tests, it usually boils down to handling all those mundane or inconvenient problems like providing realistic data entries, dealing with time-sensitive processing, or working with remote services. Experienced programmers usually boost their effectiveness with the help of a large collection of small tools for dealing with all these small typical problems. Let's take a look at a few of them.

Faking realistic data values

When writing tests based on input-output data samples, we often need to provide values that have some meaning in our application:

  • Names of people
  • Addresses
  • Telephone numbers
  • Email addresses
  • Identification numbers like tax or social security identifiers

The easiest way around that is to use hardcoded values. We've already done that in the example of our test_send() function in the Mocks and unittest.mock module section:

def test_send():
    sender = "[email protected]"
    to = "[email protected]"
    body = "Hello jane!"
    subject = "How are you?"
    ...

The advantage of doing that is that whoever reads the test will be able to visually understand the values, so it can also serve test documentation purposes. But the problem of using hardcoded values is that it does not allow tests to efficiently search through the vast space of potential errors. We've already seen in the Mutation testing section how a small set of testing samples can lead to low-quality tests and a false sense of security about your code quality.

We could of course solve this problem by parameterizing tests and using much more realistic data samples. But this is a lot of mundane repeatable work and many developers are not willing to do that on a larger scale.

One way around this monotony of sample data sets is using a readily available generator of data entries that could provide realistic values. One such generator is the faker package available on PyPI. faker comes with a built-in pytest plugin, which provides a faker fixture that can be easily used in any of your tests. The following is the modified part of the test_send() function that utilizes the faker fixture:

from faker import Faker
def test_send(faker: Faker):
    sender = faker.email()
    to = faker.email()
    body = faker.paragraph()
    subject = faker.sentence()
    ...

On each run, faker will seed the test with different data samples. Thanks to this, you are more likely to discover potential issues. Also, if you want to run the same tests multiple times using various random values, you can use the following pytest parameterization trick:

import pytest
@pytest.mark.parametrize("iteration", range(10))
def test_send(faker: Faker, iteration: int):
    ...

pytest has dozens of data provider classes and each one has several data entry methods. Every method can be obtained directly through the Faker class instance. It also supports localization, so many provider classes are available in versions for different languages.

faker can also provide date and time entries in various standards. What it can't do is freeze time. But don't worry, we have a different package for that.

Faking time values

It may happen that for some reason you would like to change the way your application experiences the passage of time. This could be useful in testing time-sensitive processing like work scheduling or inspecting the automatically assigned creation timestamps of specific objects.

You can of course always pause your application. On POSIX systems you can pause the process with the pause() system call. In Python you can set a breakpoint using the breakpoint() function. But that doesn't affect the passage of time. Time still flows. Also, when the application is suspended, it cannot continue processing, so you can't continue testing.

What we need to do instead is to trick our code into thinking that time is moving at a different rate or is stopped at a single point without interfering with normal execution. There is a great freezegun package on PyPI that is capable of doing exactly that.

The usage of freezegun is quite simple. It offers a @freeze_time decorator that can be used on a test function to freeze time at a specific date and time:

from freeze_gun import freeze_time
@freeze_time("1988-02-05 05:10:00")
def test_with_time():
    ...

During the test, all calls to the standard library functions that return current time values will return the value specified with the decorator parameter. Among other things, this means that time.time() will return an epoch value and datetime.datetime.now() will return a datetime object, which are both located at the same point in time, namely 1988-02-05 05:10:00.

The freeze_time() call can also be used as a context manager. It will return a special FrozenDateTimeFactory that allows you to precisely control the flow of time as in the following example:

from datetime import timedelta
from freeze_gun import freeze_time
with freeze_time("1988-02-04 05:10:00") as frozen:
    frozen.move_to("1988-02-05 05:10:00")
    frozen.tick()
    frozen.tick(timedelta(hours=1))

The move_to() method moves the current time context to a designated point of time (string-formatted or datetime object) and tick() progresses the time by a specified interval (1 second by default).

Freezing time should of course be done really carefully. If your application actively checks the current time with time.time() and waits until a certain time passes, you could easily lock it in indefinite sleep.

Summary

The most important thing in developing software with TDD is always starting with tests. That's the only way you can ensure that code units are easily testable. This also naturally encourages good design practices like the single responsibility principle or the inversion of control. Sticking to those principles helps in writing good and maintainable code. And we've already seen how hard it is to test code reliably when tests are just an afterthought.

But caring about software correctness and maintainability does not end with testing and quality automation. These two allow us to verify the requirements we know about and fix bugs we have discovered. We can of course deepen the testing suite, and we've learned that mutation testing is an effective technique to discover potential testing blindspots, but this approach has its limits.

What follows next is usually the constant monitoring of the application and listening to bug reports submitted by the users. You probably don't want to treat your users as a free workforce and swarm them with countless bugs to discover, but in the long run they will be the best source of insights you can get. That's both due to their scale and the fact that they are the ones who have the most interest in having a working piece of software.

But before your users will be able to get their hands on your application, you need to package and ship it. That will be the sole topic of the next chapter. We will learn how to prepare a Python package for distribution on PyPI and also discuss common patterns for releasing web-based software and desktop applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.1.239