Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

M. WilkesAdvanced Python Developmenthttps://doi.org/10.1007/978-1-4842-5793-7_4

4. From script to framework

Matthew Wilkes¹

(1)

Leeds, West Yorkshire, UK

The package we’ve created so far has a relatively basic script interface and no extensibility. The majority of applications do not need a way to be extended; it is often easier to package all optional code together rather than go to the trouble of maintaining plugins that are distributed apart from the main codebase. However, it can be very appealing to use a plugin architecture to manage (for example) optional features of an application.

If your direct users are other programmers, then it might be a good idea to provide a plugin architecture to make their jobs easier. This is often the case for open source frameworks, where external developers may create additional features, either for their own use or through consulting agreements for their clients. If you’re working on an open source project and are unsure if you should use a plugin architecture, I’d err on the side of including it. People will extend your code either way; it’s easier to make sense of bug reports that include well-defined plugins than it is for forks of your software that add additional features.

The users of our sensor tool aren’t necessarily programmers; they’re people that want to get information on a given system. However, it’s possible that they’ll want custom information for their particular use case, in which case they may well engage a programmer to add a new feature.

We’re already well on our way to being able to offer a plugin architecture; we have a well-defined class that describes the behavior of our sensors in the form of our Sensor[type] generic base class. Aside from a well-defined interface, we need a way of enumerating the sensors that we have available to us. We do this in the show_sensors function , which hard-codes all the sensors in the file. This works perfectly well for applications that don’t need a plugin architecture, where all the sensors are written by the same developers and distributed as a single group. It fails as soon as we expect third parties to be writing custom sensors.

Writing a sensor plugin

For a moment, let’s think about what we’d want from this tool as a user. As well as the temperature and humidity sensors that many people might use, there are a few things I’d like to monitor that very few other people would find useful. One of them is the output of my roof-mounted solar panels. I have a script to pull readings over Bluetooth from my inverter, which uses an existing open source command-line tool to do the hard work of collecting and interpreting the data. I’d like to be able to incorporate this into my data collection.

As integration with a specific brand and model of solar panel inverter is not a useful component for most people, I am not going to integrate it into the core apd.sensors package . Instead, I’ll create a stand-alone plugin, as users might for their custom logic.

If I envisioned this being a generally useful sensor, I might be tempted to add this sensor to the same file as the existing ones and list it alongside the others in show_sensors . This would mean that every other user of the software would see the following as part of the script’s output:

> pipenv run sensors

...

Solar panel cumulative output

Unknown

Solar panel output isn’t a useful addition for the vast majority of people; it’s better as an optional component that users can install if needed. I wouldn’t even run this on all of the Raspberry Pi nodes that I have set up, as only one is connected to the solar panel inverter.

If you are building a server monitoring setup with this code, you likely need a few different sets of plugins. While you may have CPU and RAM usage figures on all machines, there are application-specific metrics for some server roles, for example, job queue length for machines that handle asynchronous tasks, the number of blocked hosts for a web application firewall server, or connection statistics for a database server.

There are two broad approaches as to how to deal with the fact that this requires an outside tool. Firstly, I could create a Python distribution that includes the C code for the tool that I require. I would then have to arrange for that to be compiled and linked when my Python package is installed. I’d need to include error handling for problems with this tool not being installable and document its requirements. Once it’s installed, I could use that binary using either its existing script interface or directly with Python’s support for calling native code.

Alternatively, I could document that my sensor only works if that tool is installed and make the code assume that it is present. This massively simplifies the process for me, the developer, but makes installation harder for end-users. As I don’t envision this being generally useful, this is by far the most appealing choice. There is no sense in building something perfect over something good enough, especially when you have very few users.

I choose the path of assuming that the existing tool is in place, and my code will not return a result if that program is missing. The standard library function subprocess.check_output(...) is very useful for this, as it makes it simple to call another process, wait for it to finish, and read both its output status and what was printed.

Developing the plugin

Developing this sensor is another great opportunity to use Jupyter notebooks for prototyping. We need a remote environment on a Raspberry Pi server, as discussed in Chapter 1, with the apd.sensors package installed into it. This allows us to connect through our local Jupyter instance and be able to import the Sensor base class from the version of apd.sensors installed on the server.

We can then begin prototyping, starting off with a Jupyter cell that only gets the data out of the inverter and another underneath that formats it as we’d like, as shown in Listing 4-1.

information

Listing 4-1

Prototype for extracting solar power

../images/481001_1_En_4_Chapter/481001_1_En_4_Figa_HTML.jpg

We can then build that up to contain a cell with the whole sensor subclass in and then “kick the tires” by checking that str(SolarCumulativeOutput) and similar function calls behave as expected. You may also like to take this opportunity to write some test bodies in Jupyter cells. There are a few projects that attempt to integrate pytest directly in jupyter, such as ipytest, but very few of your tests should need to be run on the target host. Any that do require specific host hardware should be marked with @pytest.mark.skipif(...) decorators when converted to standard Python files. You should only write enough testing code in the notebook to make sure you’ve not made an error in the raw data collection.

The relevant cell of the prototyping can be brought into a sensor.py file, as shown in Listing 4-2.

import typing as t

import subprocess

import sys

from apd.sensors.sensors import Sensor

bt_addr = "00:80:25:00:00:00"

class SolarCumulativeOutput(Sensor[t.Optional[float]]):

title = "Solar panel cumulative output"

def value(self) -> t.Optional[float]:

try:

output: bytes = subprocess.check_output(

["opensunny", "-i", bt_addr],

stderr=subprocess.STDOUT,

timeout=15,

)

except subprocess.CalledProcessError:

return None

lines = [line for line in output.split(b" ") if line]

found = {}

# Data format: datetime:INFO:[value] timestamp=0000 key=value

for line in lines:

start, value = line.rsplit(b"=", 1)

_, key = start.rsplit(b" ", 1)

found[key] = value

try:

yield_total = float(found[b"yield_total"][:-3].replace(b".", b""))

except (ValueError, IndexError):

return None

return yield_total

@classmethod

def format(cls, value: t.Optional[float]) -> str:

if value is None:

return "Unknown"

return "{} kWh".format(value / 1000)

Listing 4-2

apd/sunnyboy_solar/sensor.py

Even for this one-shot sensor, I’d strongly recommend creating a package, following the same approach as in Chapter 3. A package makes it easy to distribute the sensor code to our servers and to keep them up to date. You could write a single package that contains multiple custom sensors if you’d like to reduce the overhead involved, but don’t be tempted to work around the packaging system and just have free-floating Python files.

Once we’ve written our sensor, we include the relevant details in its setup.cfg and the same setup.py from our apd.sensors package and build and can publish a distribution to our local index server. Alternatively, if we were not entirely confident that we’d covered all the edge cases during development, we might choose to install an editable checkout from version control on the server in question. That would allow us to run its tests and potentially make tweaks without having to round-trip code from a local machine to the remote host.

Adding a new command option

We’ve just created a new package that includes a single sensor, but we don’t have any way of viewing its data from the command-line tool that we created in the previous chapter. That tool has a few built-in sensors and iterates over them when generating its output. We need to modify the script so that it can also show the values of sensors in other Python files.

To begin with, we can add a new option to apd.sensors that loads a sensor by its Python import location. That is, given the name of the sensor and the module it’s defined in, it would load that sensor and display its results. This is inspired by the --develop option in the pre-commit script for loading a hook by its path, for ease of testing.

With this option in place, we will be able to specify that we want the value of our solar power sensor instead of the built-in sensors, meaning we don’t have to write a special command to handle this sensor specifically.

Subcommands

We currently have a show_sensors function that includes the sensors to show as a hard-coded list. In this case, we’d want to do the same processing but change the way the list is generated to accept command-line arguments. There are two broad approaches that we could take, either we could create subcommands or we could add command-line flags.

Subcommands might not be a term you’ve heard before, but you’ve certainly used them. Tools like Git make heavy use of subcommands, where the git command on its own has no meaning. In fact, the commands git, git --help, and git help are synonyms: they all print a usage guide to the terminal. The more common invocations of git, such as git add, git clone, and git commit, are all examples of subcommands. The Git process does not have a single function that implements all the behaviors of the program; it uses subcommands to group similar functionality together. Some git commands even use multiple levels of subcommand, such as git bisect start.¹

We could adopt this approach by moving the existing show_sensors(...) function to be a subcommand called show and add a new develop subcommand.

Click provides infrastructure called parameters for this purpose; you can add options and/or arguments to functions, which are exposed as part of the command-line interface. You should think of arguments as always being present, even though the end-user may not specify a value for them. If the user doesn’t supply a value, then a default value would be used. Arguments are the core bits of data that a function operates on.

On the other hand, options are flags that are not always passed. They can change the behavior merely by being present, or they can contain optional values similar to arguments.

This subcommand uses @click.argument to specify that some data is passed as a required parameter on the command line. The metavar= parameter of @argument is the placeholder for the value to be displayed to users when they use --help.

@click.argument("sensor_path", required=True, metavar="path")

In the following example, I haven’t yet included an implementation of get_sensor_by_path(...); it can just return a hard-coded instance of the solar power sensor for now. We will provide an implementation later; for now, we’re focusing on whether we should use subcommands or not. The following is an example of creating subcommands with click:

@click.group()

def sensors() -> None:

return

@sensors.command(help="Displays the values of the sensors")

def show() -> None:

sensors = get_sensors()

for sensor in sensors:

click.secho(sensor.title, bold=True)

click.echo(str(sensor))

click.echo("")

@sensors.command(help="Displays the values of a specific sensor in" "development")

@click.argument("sensor_path", required=True, metavar="path")

def develop(sensor_path) -> None:

sensor = get_sensor_by_path(sensor_path)

click.secho(sensor.title, bold=True)

click.echo(str(sensor))

click.echo("")

if __name__ == "__main__":

sensors()

Here, the entrypoint into the system is no longer a show_sensors() command, it is a sensors() group. The show_sensors() function has been renamed to show() and is now declared with @sensors.command rather than @click.command. The change in the command decorator is what connects this command to the group named sensors.

The console_scripts entrypoint would also have to be changed to match this refactoring:

[options.entry_points]

console_scripts =

sensors = apd.sensors.sensors:sensors

Tip

Just like when we first added the console_scripts declaration, this change only takes effect during the installation of the package. You can force this by running pipenv install -e . which is useful when you’re experimenting with different approaches. Once you’ve incremented the version number in __init__.py and re-run pipenv lock, Pipenv notices this change and automatically reinstalls the package. You can take advantage of this and set a version number like 1.1.0dev1. The dev marker lets you increment the version number without any risk of using a version number that you later use for a real release.

I would recommend incrementing the VERSION attribute to a dev release for features such as this unless there are only a small number of developers working on the code and they have no barriers to communication (such as timezone differences).

Once these changes have been made, it is possible to execute the subcommand to show the value of the in-development sensor we have. As I created an apd.sunnyboy_solar package that contains a sensor.py file and a SolarCumulativeOutput class, the string that represents my sensor is apd.sunnyboy_solar.sensor:SolarCumulativeOutput.² I can check the output with the following command:

> pipenv run sensors develop apd.sunnyboy_solar.sensor:SolarCumulativeOutput

Solar panel cumulative output

14070.867 kWh

However, the transition to subcommands does mean that the command pipenv run sensors no longer behaves as it did previously. To get the data we expect for the preset sensors, we now need to run pipenv run sensors show. Because of this change, users cannot safely upgrade from an old version to a new one without changing the way they interact with the software. The upshot of this is that we need a large bump to the version number to communicate this change’s importance to our users.

If we consider the principles of the semantic versioning policy, we are considering a change that adds a feature and breaks backward compatibility. Breaking backward compatibility implies we should change the major version number, making any release of the software with this new subcommand layout be version 2.0.0. Some developers may find this unintuitive, as there is not a large conceptual change between versions 1.0.0 and 2.0.0. However, this is often borne out of a desire to avoid large major version numbers from a sense of aesthetics. I would strongly advise you don’t shy away from incrementing version numbers when there is a backward compatible change, as it really does help users reason about what upgrades are safe to apply.

Command options

The other way of looking at this feature is that displaying a single sensor’s output is fundamentally the same task as displaying the output of all sensors, albeit with some different preferences. This is the core of the decision you need to make when deciding between subcommands and options: is the feature you’re adding another logical feature of the application, or is it a different behavior for an existing feature?

There is no hard-and-fast rule for how to differentiate the two; in our case, there are arguments to be made each way. In my opinion, changing either the sensors that are being read or the format of the output are all arguments to the same underlying “show” function. My implementation uses the “option” approach, but this is a subtle difference that depends very much on how you view the tool that you’re creating.

To use the option approach, we need to add a @click.option line to the existing show_sensors(...) function that represents the path to the sensor that we should use instead of the hard-coded sensor list.

In our case, we would add an option called --develop which is not required and then use an if statement to decide if we should load the sensor referred to by the develop option or if we should use our hard-coded list as usual.

@click.command(help="Displays the values of the sensors")

@click.option(

"--develop", required=False, metavar="path", help="Load a sensor by Python path"

)

def show_sensors(develop: str) -> None:

sensors: Iterable[Sensor[Any]]

if develop:

sensors = [get_sensor_by_path(develop)]

else:

sensors = get_sensors()

for sensor in sensors:

click.secho(sensor.title, bold=True)

click.echo(str(sensor))

click.echo("")

return

This behaves very similarly to the subcommand approach with the default syntax being unchanged and the new code path being available with

> pipenv run sensors --develop=apd.sunnyboy_solar.sensor:SolarCumulativeOutput

Solar panel cumulative output

14070.867 kW

Error handling

The program we’ve written has, thus far, not had a real implementation of get_sensor_by_path(...), which is vital for it to be usable in the real world. We could write a naïve function that implements this, for example:

Unsafe version of get_sensor_by_path

def get_sensor_by_path(sensor_path: str) -> Any:

module_name, sensor_name = sensor_path.split(":")

module = importlib.import_module(module_name)

return getattr(module, sensor_name)()

This implementation has some significant flaws. Firstly, we are assuming that sensor_path always contains a colon. If this isn’t true, a ValueError is raised for insufficient values to unpack on the first line. Then, the next line could raise an ImportError and the third line an AttributeError. Those errors would be shown to the user as tracebacks, which is not very user-friendly. The more useful error messages we want to offer to the user, the more conditions we need to add.

That isn’t the biggest problem with this implementation, in any case. On the final line of this function, we want to instantiate the sensor that the user has selected, but we don’t know that it’s a sensor subclass. If the user ran pipenv run sensors --develop=sys:exit, then the command would call sys.exit() and immediately terminate. If they ran pipenv run sensors --develop=http.server:test, then the command would block and an unconfigured HTTP server would start up listening on port 8000 on all addresses.

These aren’t serious security vulnerabilities, as anyone who could run the sensor script could presumably run Python themselves and invoke these functions themselves. However, there is no good reason to allow users to do things that are clearly wrong and potentially damaging. It’s essential to consider the safety of such code every time you write it, as the trade-offs are always different.

The following implementation of get_sensor_by_path(...) traps all the common errors that could be caused by bad user input and reraises as a RuntimeError³ with the appropriate user message.

Implementation of get_sensor_by_path that optionally raises RuntimeError

def get_sensor_by_path(sensor_path: str) -> Sensor[Any]:

try:

module_name, sensor_name = sensor_path.split(":")

except ValueError:

raise RuntimeError("Sensor path must be in the format " "dotted.path.to.module:ClassName")

try:

module = importlib.import_module(module_name)

except ImportError:

raise RuntimeError(f"Could not import module {module_name}")

try:

sensor_class = getattr(module, sensor_name)

except AttributeError:

raise RuntimeError(f"Could not find attribute {sensor_name} in " f"{module_name}")

if (isinstance(sensor_class, type) and issubclass(sensor_class, Sensor) and sensor_class != Sensor):

return sensor_class()

else:

raise RuntimeError(f"Detected object {sensor_class!r} is not " f"recognised as a Sensor type")

AUTOMATIC TYPE INFERENCE

It’s worth paying attention to the type annotations of both versions of this function. The first version had no check to see if the specified component was a sensor, so we declared it as returning Any.

If we create the following test code in src/apd/sensors/mypyexample.py and then run it through the mypy type checker, we see that it can’t identify the type of sensor:

import importlib

module = importlib.import_module("apd.sensors.sensors")

class_ = getattr(module, "PythonVersion")

sensor = class_()

reveal_type(sensor)

Result

mypyexample.py:6: note: Revealed type is 'Any'

The parser cannot tell what type the class in the class_ variable is, as it would need to execute the particular code in import_module and getattr(...) to find what object is returned. In the preceding example, both of these are hard-coded, but if one or both of these strings were supplied by user input, then it would be impossible without knowing what the user input would be in advance. Therefore, as far as mypy is concerned, class_ and sensor can be any type.

However, if we guard the line that instantiates class_ with some checks to determine if class_ is a type, and if that type is a subclass of Sensor, then mypy understands the situation well enough⁴ to detect that sensor is an instance of Sensor[Any].

import importlib

from .sensors import Sensor

module = importlib.import_module("apd.sensors.sensors")

class_ = getattr(module, "PythonVersion")

if isinstance(class_, type) and issubclass(class_, Sensor):

sensor = class_()

reveal_type(sensor)

Result

mypyexample.py:6: note: Revealed type is 'sensors.sensors.Sensor[Any]'

It is possible to force an instance to be considered as Sensor[Any] manually by using typing.cast(Sensor[Any], sensor), but this is rarely necessary and can potentially mask some errors.

The calling function can then trap any RuntimeError that we generate and display a user-suitable error message by coercing the exception to a string:

if sensor_path:

try:

sensors = [get_sensor_by_path(sensor_path)]

except RuntimeError as error:

click.secho(str(error), fg="red", bold=True, err=True)

sys.exit(ReturnCodes.BAD_SENSOR_PATH)

This prints the value of the RuntimeError in bold red text to the standard error stream and then exits the script with a known exit code. Exit codes are a handy feature of console scripts in Unix-like environments. It allows for scripted calling of the program that can handle error cases without having to parse the resultant errors.

We should use an enumeration to store the valid codes. This is a special base class for classes that contain only a mapping from a name to an integer that includes some useful features like custom string representations that can be useful when debugging.

class ReturnCodes(enum.IntEnum):

OK = 0

BAD_SENSOR_PATH = 17

Many tools use low numbers and numbers approximately equal to 255 to define their own internal errors, so picking an offset of 16 makes it unlikely that our return codes would conflict with any others that our tools raise. In particular, we should not use 1 as anything but a general failure code. I have picked 17 as the exit code to represent errors where the arguments passed to the program mean that parsing could not succeed.

Off-loading parsing to Click with argument types

Click supports decoding the values passed in as parameters automatically. For some argument types, this makes intuitive sense; it is easier to declare that a parameter is a number (or a boolean value, etc.) than always to pass on a string and have the command parse the value itself.

There are built-in types in Click that can be used to improve the usability of command-line tools. The simple types click.STRING, click.INT, click.FLOAT, and click.BOOL do relatively straightforward parsing of their input values, converting the norms of command-line invocations to Python values. For example, click.FLOAT calls float(...) on the input, and click.BOOL checks the input against a short list of known values that mean True or False, such as y/n, t/f, 1/0, and so on. It is possible to specify these types by using the Python type (i.e., str, int, float, bool) directly as a shorthand, and if no type is specified, Click attempts to guess the type.

There are some more involved types, such as click.IntRange which applies validation on top of click.INT and click.Tuple(...) which allows for specifying the type of options that take multiple options. For example, if you were working on a program that accepts locations, you might have a --coordinate argument which would be defined as follows:

@click.option(

"--coordinate",

nargs=2,

metavar="LAT LON",

help="Specify a latitude and longitude according to the WGS84 coordinate system",

type=click.Tuple((click.FloatRange(-90, 90), click.FloatRange(-180, 180))),

)

Using these types ensures that data passed to your functions is valid and that end-users get useful error messages. It also significantly reduces the amount of parsing and validation logic you have to write. This can be especially useful with the most complex of all the types Click offers, click.File. This type allows you to specify that an open file reference should be passed to the function and closed properly after the function has finished executing. It also allows for specifying - to mean that the standard input and standard output streams should be used instead of files on the drive, which is a feature that many command-line tools offer and usually has to be added as a special case.

Perhaps the most surprisingly useful type is click.Choice, which takes a tuple of strings to check the value against. For example, click.Choice(("red", "green", "blue"), case_sensitive=False) provides a type validator that only accepts the strings “red”, “green”, and “blue”. Additionally, if your user has enabled autocomplete for your program, then these values can be suggested automatically if a user hits tab during this argument.

Custom click argument types

New types can be added to Click’s parsing system , which allows for programs that need to do the same command-line parsing regularly to split this out into a single reusable function and trust the framework to invoke it.

In our case, we only have one place where we expect a reference to a Python class to be passed as an argument so there is no practical reason to implement Python class as a type that functions can expect. It’s relatively rare for this to be the right approach, but it’s certainly possible that you’ll need to do this for a project in future.

The following is a parser for Python class:

from click.types import ParamType

class PythonClassParameterType(ParamType):

name = "pythonclass"

def __init__(self, superclass=type):

self.superclass = superclass

def get_sensor_by_path(self, sensor_path: str, fail: Callable[[str], None]) -> Any:

try:

module_name, sensor_name = sensor_path.split(":")

except ValueError:

return fail(

"Class path must be in the format dotted.path." "to.module:ClassName"

)

try:

module = importlib.import_module(module_name)

except ImportError:

return fail(f"Could not import module {module_name}")

try:

sensor_class = getattr(module, sensor_name)

except AttributeError:

return fail(f"Could not find attribute {sensor_name} in " f"{module_name}")

if (

isinstance(sensor_class, type)

and issubclass(sensor_class, self.superclass)

and sensor_class != self.superclass

return sensor_class

else:

return fail(

f"Detected object {sensor_class!r} is not recognised as a " f"{self.superclass} type"

)

def convert(self, value, param, ctx):

fail = functools.partial(self.fail, param=param, ctx=ctx)

return self.get_sensor_by_path(value, fail)

def __repr__(self):

return "PythonClass"

# A PythonClassParameterType that only accepts sensors

SensorClassParameter = PythonClassParameterType(Sensor)

And here is the updated option call to use built-in parser:

@click.option(

"--develop",

required=False,

metavar="path",

help="Load a sensor by Python path",

type=SensorClassParameter,

)

EXERCISE 4-1: ADDING AUTOCOMPLETE SUPPORT

I mentioned click.Choice earlier in this chapter, which provides support for autocompleting the values of certain options. It is possible to provide a callback for any option parameter to allow custom autocompletion.

It isn’t feasible to write a perfect autocomplete implementation for the --develop flag, as it involves autocompleting Python module names. It would be too difficult to scan the environment to determine all possibilities.

However, it is much easier to write an autocomplete implementation that completes the class part once the module has been entered. There is an example of one such implementation in the accompanying code for this chapter; try writing one yourself before looking at it.

The method signature for the autocomplete method is

def AutocompleteSensorPath(

ctx: click.core.Context, args: list, incomplete: str

) -> t.List[t.Tuple[str, str]]:

The autocompletion method is enabled for an option by adding autocompletion=AutocompleteSensorPath as an argument.

When testing this, you may need to drop into a shell within the virtual environment and manually enable autocompletion for the sensors executable. For example, to enable autocomplete for the bash shell, you’d use

> pipenv shell

> eval "$(_SENSORS_COMPLETE=source_bash sensors)"

You need to manually enable autocompletion because autocomplete configuration is usually handled by a package installer and varies wildly between operating systems. The _SENSORS_COMPLETE=source_bash environment variable tells click to generate a bash autocomplete configuration instead of the normal handling. In the preceding example, this is processed immediately using eval, but you could also save the result in a file and then include that in your shell’s profile. You should check what the recommended approach is for your particular operating system and shell combination.

In addition, the : character may cause some shells to abort autocompletion. In this case, enclose the argument to --develop in quotation marks and try again.

Canned options

Finally, some uses of options are more common than others. The most common option that people want in their program is --help to display information about how a command is to be invoked. Click automatically adds this option to all commands unless you specify add_help_option=False in the @click.command(...) call. You can manually add help options using the @click.help_option(...) decorator function, for example, if you need to support different languages:

@click.command(help="Displays the values of the sensors")

@click.help_option("--hilfe")

def show_sensors(develop: str) -> int:

...

Another frequently desired function is --version, which prints the version of the command that is installed on the user’s computer. Like --help, this is implemented internally as an option with is_flag=True and is_eager=True, as well as having a specialized callback method. Options that have is_flag set do not have an explicit value attached, they are either present or not, which is represented by their value being either True or False.

The is_eager parameter marks an option as being important to parse early on in the process of parsing the command-line options. It allows the --help and --version commands to implement their logic before the other arguments to the function have been parsed, which helps the program to feel quick and responsive.

The version parameter is applied using the @click.version_option(...) decorator. The decorator takes the options prog_name to specify the name of the current application and version to specify the current version number. These are both optional: if prog_name is not set, then the name the program was invoked with is used. If the version parameter is omitted, then the currently installed version is looked up from the Python environment. As such, it’s usual not to need to override either of these values. The standard way to add this option is therefore to add the decorator: @click.version_option().

For some operations, such as deletions, you may want to get explicit confirmation from the user before continuing. This can be implemented with @click.confirmation_option(prompt="Are you sure you want to delete all records?"). The prompt= option is optional: if it is omitted, the default prompt of “Do you want to continue?” is used. Users can also skip the prompt by passing the command-line flag --yes.

Finally, there is a @click.password_option decorator, which prompts the user for a password immediately after the application starts. This defaults to asking the user for their password and to then confirm it, as though a password is being set, but the confirmation step can be disabled with confirmation_prompt=False. The password itself is not shown in the terminal, preventing it from being read by people near the computer at the time. If you use this option, you should ensure that the underlying command takes a password= option, so you have access to the password the user entered.

Allowing third-party sensor plugins

Now that we’ve upgraded the command-line tool to allow for testing our external sensor and we’ve completed an implementation that returns useful data, we have covered the rarer of two use cases: helping developers write new plugins. The more common case is that of end-users – people who have installed a plugin sensor and want it to “just work.” It would not be appropriate to have these users need to specify Python paths on every command-line invocation. We need a way of dynamically generating the list of available sensors.

There are two broad approaches that we can take to this problem: autodetection and configuration. Autodetection involves sensors registering themselves with the command-line tool in such a way that a list of all installed sensors is available at runtime. Alternatively, configuration relies on users maintaining a file that points to what sensors they want to install, which is then parsed at runtime.

Like most decisions between two approaches that we’ve made so far, there are strengths and weaknesses of both methods, and the trick is in picking the right one for your particular use case, as shown in Table 4-1.

Table 4-1

Comparison of configuration and autodetection of sensor types

Comparison	Configuration	Autodetection
Ease of installation	Install package and edit configuration file	Install package
Reorder plugins	Possible	Not possible
Override built-in plugin with a new implementation	Possible	Not possible
Exclude installed plugin	Possible	Not possible
Plugins can have parameters	Possible	Not possible
User-friendliness	Requires that users be comfortable editing configuration files	No additional steps are required

Using a configuration-based system allows for a lot more control over the details of the plugin system. It is very well suited for plugin architectures that are likely to be used by developers or systems integrators as it allows them to configure the exact environment they want and to store this in version control. An example of this is the Django apps system. Apps are installed into the local environment but do not affect the website until they have been added to the settings.py file, at which point they can have plugin-specific settings added.

This approach is appropriate for Django and other systems where a customized deployment is created by mixing and matching third-party code and specially developed software. It is common to want to use a subset of the features offered by apps that have been installed, for example, by omitting some middleware options or setting up different URL schemes. This complexity stands in stark contrast to systems like WordPress, where installation of a plugin is intended to be well within the capabilities of nontechnical users. In this case, installing the plugin is sufficient itself, and more complex configuration is handled by the application rather than a central configuration file.

Theautodetection method is significantly easier for nontechnical end-users, as they do not need to edit configuration files. It also makes the system less sensitive to typographical errors. For our use case, it’s unlikely that we would need to disable plugins, as users can ignore any data they don’t require. The ordering of plugins is similarly unimportant.

Overriding plugins with a new implementation may seem useful at first glance, but it would mean that collected values might have slightly different meanings depending on which version is used. For example, we might want to add a “Temperature” sensor that returns the system temperature rather than the ambient temperature. For some use cases, these might be interchangeable, but it’s best to keep the distinction in the data. We can always draw an equivalence when analyzing the data if required.

The one feature that a configuration-based system has that would be useful for this program is the ability to pass configuration values through to the sensors themselves. So far we have three sensors that would very much benefit from configuration: the temperature and humidity sensors are hard-coded to expect the sensor to be on IO pin D4 of the system they’re running on, and the solar panel sensor is hard-coded to a specific Bluetooth hardware address.

Both of these are acceptable for private plugins that we don’t expect to work for other people (such as the solar panel monitor), but the temperature and humidity sensors are a more general-purpose sensor that we would expect a range of users to be interested in installing. The temperature and humidity sensors need to have minimal configuration options for end-users.

Plugin detection using fixed names

It would be possible to write a plugin architecture that detects sensors defined in a file that’s importable by virtue of it being in the current working directory. This approach uses Python’s source code parsing as the parsing system for the configuration files. For example, we could create a custom_sensors.py file and import any sensors that we want to use in that file.

def get_sensors() -> t.Iterable[Sensor[t.Any]]:

try:

import custom_sensors

except ImportError:

discovered = []

else:

discovered = [

attribute

for attribute in vars(custom_sensors).values()

if isinstance(attribute, type)

and issubclass(attribute, Sensor)

]

return discovered

The vars(custom_sensors) function here is the most unusual part of the code. It returns a dictionary of all things defined in that module where the keys are the variable names and the values the contents of the variable.

Note

The vars(...) function is helpful when debugging. If you have a variable obj and call vars(obj), you get a dictionary of the data set on that object.⁵ The related function dir(obj) returns a list of all attribute names resolvable on that instance. If you want to learn about an object during a debugging session, these are both very useful.

Using Python as the configuration has the advantage of being very simple, but writing a custom Python file is a very technical approach that most users wouldn’t like to use. Users would have to manually copy the sensor code into this file (or import it from elsewhere) and manage any dependencies themselves. I cannot recommend this as a plugin architecture system for any circumstance, but the idea of having a python file be importable through being in a working directory is sometimes useful as a means of configuration, as we will see toward the end of this book.

Plugin detection using entrypoints

For our use case, I think that the ease of use is the most important consideration, so we should adopt an approach that does not rely on configuration files for plugin detection. Python has a feature for implementing this type of autodetection that we briefly mentioned in a previous chapter. It’s called entrypoints . The entrypoint feature was what we used to declare that a function should be exposed as a console script (in fact, that is by far the most common use of the feature), but any Python code can use the entrypoint system for its own plugins.

A Python package can declare that it provides entrypoints, but as they’re a feature of the packaging tools, entrypoints cannot be set up from anywhere but a Python package’s metadata. When a Python distribution is created, much of the metadata is split out into files in a metadata directory. This is distributed along with the actual code. This parsed version of the metadata is what is scanned when code requests the registered values for an entrypoint. If a package provides entrypoints, then they can be enumerated as soon as the package is installed, making for a very effective way for code to discover plugins across packages.

Entrypoints are registered in a two-level namespace. The outer name is the entrypoint group, which is a simple string identifier. For the automatic generation of command-line tools, this group name is console_scripts (and, less commonly, gui_scripts for graphical tools). These group names do not have to be preregistered, so your packages can provide entrypoints that other software may use. If your end-user does not have that software installed, then they are ignored. The group name can be any string, which can then be used to query all the things referred to by the entrypoint.

You can find what entrypoint groups are in use in your Python installation using the pkg_resources module . This isn’t something you ever need to do in code, as evidenced by the fact that there isn’t an easy API for it, but it is interesting to look at when learning about the feature and how other Python tools use it. The following is a one-line program⁶ (excluding imports and formatting for ease of reading) used to list the entrypoint types in use in a Python environment:

>>> functools.reduce(

... set.union,

... [

... set(package.get_entry_map(group=None).keys())

... for package in pkg_resources.working_set

... ],

... )

...

{'nbconvert.exporters', 'egg_info.writers', 'gui_scripts', 'pygments.lexers', 'console_scripts', 'babel.extractors', 'setuptools.installation', 'distutils.setup_keywords', 'distutils.commands'}

The preceding example shows that there are nine different groups of entrypoints in use on my computer. Most of these are involved in Python package management, but three are other plugin systems installed on my computer. nbconvert.exporters is part of the Jupyter suite of tools; in the first chapter, we used nbconvert to convert our notebook to a standard Python script. That converter was found by checking this entrypoint, meaning that it would be possible for us to write our own exporters if desired. pygments.lexers is part of the pygments code formatting library; these entrypoints allow for new languages to be supported by pygments, and babel.extractors are entrypoints to help the i18n tool babel find translatable strings in different types of source code.

The second layer of namespacing is the name of the individual entrypoint. These must be unique within a group and are not inherently meaningful. You can search for a particular entrypoint name with iter_entry_points(group, name), but it’s more common to get all entrypoints within a group, with iter_entry_points(group).

All this means that we need to decide on a standard string to use as the entrypoint group name and have plugins declare that they provide entrypoints in this group. We must also update our core code to ensure that all the plugins are declared as such. We will use the string apd.sensors.sensor as that is meaningful and unlikely to conflict with things other developers might do. The setup.cfg file of apd.sensors would have the entrypoints section modified as follows:

[options.entry_points]

console_scripts =

sensors = apd.sensors.cli:show_sensors

apd.sensors.sensor =

PythonVersion = apd.sensors.sensors:PythonVersion

IPAddresses = apd.sensors.sensors:IPAddresses

CPULoad = apd.sensors.sensors:CPULoad

RAMAvailable = apd.sensors.sensors:RAMAvailable

ACStatus = apd.sensors.sensors:ACStatus

Temperature = apd.sensors.sensors:Temperature

RelativeHumidity = apd.sensors.sensors:RelativeHumidity

The apd.sunnyboy_solar package use the same entrypoint group name to add its one plugin to the set of known plugins, by declaring the following entrypoints section in its setup.cfg:

[options.entry_points]

apd.sensors.sensor =

SolarCumulativeOutput = apd.sunnyboy_solar.sensor:SolarCumulativeOutput

The only change we’d need to make to the code to use entrypoints instead of hard-coding the sensors is to rewrite the get_sensors method, as follows:

def get_sensors() -> t.Iterable[Sensor[t.Any]]:

sensors = []

for sensor_class in pkg_resources.iter_entry_points("apd.sensors.sensor"):

class_ = sensor_class.load()

sensors.append(t.cast(Sensor[t.Any], class_()))

return sensors

The cast here is not strictly necessary. We could also use the isinstance(...) guarding⁷ that we looked at for the --develop option; however in this case, we’re willing to trust that plugin authors only create entrypoints that refer to valid sensors. Previously we were relying on command-line invocations, where the chance of errors is rather higher. The effect of this is that we’re telling the typing framework that anything we get from loading an apd_sensors entrypoint and calling the result is a valid sensor.

Like with the console_scripts entrypoints, we need to reinstall both of these packages to make sure that the entrypoints are processed. For real releases of the script, we would increment the minor version number as we’ve introduced a new feature that doesn’t break backward compatibility, but as we’re working with a development installation, we would re-run pipenv install -e . to force the installation.

Configuration files

The alternative approach, which we dismissed earlier, was to write a configuration file. Python’s standard library supports parsing ini files, which are relatively easy for users to edit. Alternatively, a configuration format like YAML or TOML may make parsing easier, but editing would be less familiar for users.

Generally speaking, I would recommend using the ini format for configuration due to the benefits of its familiarity to end-users.⁸ We also need to decide where to keep the ini files; they could be in a working directory, perhaps explicitly included as a command-line argument if appropriate, or in a well-known default directory for the current operating system.

Wherever we decide to store the files, we would create a new argument to the command line that accepts the location of a configuration file to use; only the default behavior would differ. We would also need to create a function that reads the configuration file and instantiates the sensors using any relevant configuration data.

The configparser module in the standard library has a simple interface for loading ini formatted data from one or more files, so this is what we would use to load the configuration values. We’ll define our ini format as having a [config] section that contains a plugins= value. The items in the plugins value point at new sections, each of which defines a sensor with its (optional) configuration values. The following is a basic config.cfg file for apd.sensors:

[config]

plugins =

PythonVersion

IPAddress

[PythonVersion]

plugin = apd.sensors.sensors:PythonVersion

[IPAddress]

plugin = apd.sensors.sensors:IPAddresses

This shows some of the power of a configuration system, as this configuration file only loads two of the sensors, which greatly speeds up execution time. Less obvious is the fact that the sensor configuration blocks do not need to have the same name as the sensor classes from which they’re derived, for example, IPAddress vs. IPAddresses. The same sensor class can be listed multiple times in this way, making it possible to have a configuration that defines multiple instances of the same sensor with different parameters, and collects data from each.⁹ A sensor could also be removed from the plugins line to disable it temporarily without needing to delete its configuration.

The parser for this config file maps the plugins line of the [config] section to the key config.plugins. Our code must check this value, extract the names, and then iterate over the sections to which it refers. It’s a good idea to keep the parsing and the sensor instantiation as independent functions, as this dramatically improves the testability of each. The testability would be slightly better if reading the config and parsing it were distinct functions, but as configparser provides this functionality, it makes sense to reduce the amount of file handling code we need to write ourselves and leave that to configparser.

Like the previous --develop helper functions, we would catch any relevant errors here and reraise as RuntimeError with a user-friendly message. These would then be raised to end-users as an error message and with a new return code to represent a problem with the config file:

def parse_config_file(

path: t.Union[str, t.Iterable[str]]

) -> t.Dict[str, t.Dict[str, str]]:

parser = configparser.ConfigParser()

parser.read(path, encoding="utf-8")

try:

plugin_names = [

name for name in parser.get("config", "plugins").split() if name

]

except configparser.NoSectionError:

raise RuntimeError(f"Could not find [config] section in file")

except configparser.NoOptionError:

raise RuntimeError(f"Could not find plugins line in [config] section")

plugin_data = {}

for plugin_name in plugin_names:

try:

plugin_data[plugin_name] = dict(parser.items(plugin_name))

except configparser.NoSectionError:

raise RuntimeError(f"Could not find [{plugin_name}] section " f"in file")

return plugin_data

def get_sensors(path: t.Iterable[str]) -> t.Iterable[Sensor[t.Any]]:

sensors = []

for plugin_name, sensor_data in parse_config_file(path).items():

try:

class_path = sensor_data.pop("plugin")

except TypeError:

raise RuntimeError(

f"Could not find plugin= line in [{plugin_name}] section"

)

sensors.append(get_sensor_by_path(class_path, **sensor_data))

return sensors

The get_sensors(...) function would take an iterable of strings which are the possible paths to config files. A new --config parameter can be added to the show_sensors command that defaults to "config.cfg" to collect the value of path that will be passed to get_sensors(...).

@click.option(

"--config",

required=False,

metavar="config_path",

help="Load the specified configuration file",

)

Each sensor that needs a configuration variable must now accept it as a parameter to the __init__(...) function for the sensor class. This function defines the behavior for creating instances of the class and is where you would handle arguments to the class instantiation. The Temperature sensor would store the variables it needs in the __init__(...) function and then refer back to them in the value(...) function. The following is a partial listing of Temperature sensor that accepts configuration parameters:

class Temperature(Sensor[Optional[float]]):

title = "Ambient Temperature"

def __init__(self, board="DHT22", pin="D4"):

self.board = board

self.pin = pin

def value(self) -> Optional[float]:

try:

import adafruit_dht

import board

except (ImportError, NotImplementedError):

return None

try:

sensor_type = getattr(adafruit_dht, self.board)

pin = getattr(board, self.pin)

return sensor_type(pin).temperature

except RuntimeError:

return None

For some applications, you may want to provide more standardized loading of configuration files, in which case we can take advantage of the fact that configparser can handle a list of potential paths to pass in all possible config file locations.¹⁰ A simple way of doing this would be to include /etc/apd.sensors/config.cfg and ~/.apd_sensors/config.cfg in the code, but this would not work on Windows. The Python package installer pip follows the configuration pattern. It has a very sophisticated code path for determining where config files could be, correctly implementing the expected locations for a range of platforms. As pip is MIT licensed, which is compatible with apd.sensors’s license, we can make use of those functions to make the sensors command feel more like a well-behaved citizen of those different operating system ecosystems. An example of this is included in the accompanying code for this chapter.

Of course, changing the way that plugins are loaded has a knock-on effect for the tests of apd.sensors, meaning that some new fixtures and patches are required to support the substantive changes in cli.py. This does also allow us to be more flexible in our tests, by including configuration files that set up dummy sensors that are only ever used to test the infrastructure of the program.

Environment variables

A final way that we could approach the need to configure a small number of sensors is to make use of environment variables. These are variables that are made available to programs by the system, often containing information like library paths. We can write the few sensors that need configuration to look in the environment variables for their configuration. In this case, we wouldn’t need any loading of configuration files. We could use the autodetect style of sensor discovery and put the value extraction in the __init__ functions. Environment variables are exposed like a dictionary on the attribute os.environ, so the equivalent to the preceding implementation of Temperature that uses the environment would be

def __init__(self):

self.board = os.environ.get("APD_SENSORS_TEMPERATURE_BOARD", "DHT22")

self.pin = os.environ.get("APD_SENSORS_TEMPERATURE_PIN", "D4")

These could be set on the command line; however, the easiest way to define them when using pipenv is to use the “dotenv” standard, that is, creating a file called .env in the root of your pipenv installation that contains the relevant definitions. The pipenv run command loads this file and sets any variables defined every time a program is run. In this case, the file would look something like

.env

APD_SENSORS_TEMPERATURE_BOARD=DHT22

APD_SENSORS_TEMPERATURE_PIN=D4

Managing environment variables can be difficult on some platforms. This .env file paradigm allows us to treat them like a minimal configuration file, which makes them a good choice for very minimal configuration. There is a similar trade-off to the one we looked at for command-line parameters; we are choosing a simpler solution that offers no automatic parsing for configuration, rather than the more involved parsing for arguments, because unlike the argument parsing, these decisions have a substantial effect on the usability of the program.

Approach for apd.sensors vs. similar programs

While there are arguments for using a comprehensive configuration filesystem, for my particular use case, I want something that works out of the box with minimal effort from end-users. People following along who are thinking of, say, server status aggregation may find themselves coming down on the other side of this decision. It very much depends on the user interface that you want to offer, with it being possible to write more and more complex code to support your exact desires.

For example, some tools that make use of the subcommand style of command invocation actually define a config command to assist users in managing their config files, rather than having them edit them directly. The version control software git is an example of this, where any user-facing setting can be set using the git config command , specifying which of various configuration files should be read.

For apd.sensors, at this stage, the path of least resistance is to use entrypoints to enumerate the plugins and environment variables to configure them, disregarding any possibility to ignore installed plugins or reorder them.

Summary

Much of the rest of this chapter has covered general software engineering topics, such as configuration file management and command-line tool user experience. The tools available to us in Python offer a lot of flexibility in these regards, so we can focus on making the best decision for our users, rather than being pushed toward an approach by limitations of the software.

The plugin system requirement is where Python really shines, however. The tool we’re building is somewhat unusual, in that it’s designed to allow other code to extend it. Although it’s common for developer frameworks to use plugin systems, most software that you write is a stand-alone application. This makes it all the more surprising that Python’s entrypoint system is so good. It is a fantastic way of defining simple plugin interfaces; it deserves to be more well known.

The overall approach that we’ve taken with the software during the course of this chapter is to opt for the simplest user interface that we can offer to users. We have looked at alternatives that we may choose to introduce in future, but have decided that the features they offer are not important at this stage.

Our command-line tool is effectively complete. We have a working plugin interface that allows for configuration of individual sensor parameters and for application-specific sensors to be installed. The program is a stand-alone Python application that can be installed on the various computers we want to monitor. The best way of doing this is to use a new Pipfile , as the one we have been using so far is intended to build a development environment of the code.

The new Pipfile will use a released version of apd.sensors and the private distribution server we created to house releases. We can create this on a Raspberry Pi and then distribute the Pipfile and Pipfile.lock to all other Raspberry Pis that we want to install on.

Production deployment Pipfile

[[source]]

name = "pypi"

url = "https://pypi.org/simple"

verify_ssl = true

[[source]]

name = "piwheels"

url = "https://piwheels.org/simple"

verify_ssl = true

[[source]]

name = "rpi"

url = "http://rpi4:8080/simple"

verify_ssl = false

[packages]

apd-sensors = "*"

[requires]

python_version = "3.8"

Additional resources

As this chapter has focused on decision-making more than features of Python, there are not many new pieces of software introduced in this chapter. The following online resources provide some additional detail on approaches that were not relevant to our use case, as well as some help with advanced use of command-line scripts on different operating systems:

The Python Packaging Authority documentation has a section onenumerating plugins using other methods, such as finding modules that match a given name. If you’re interested in other ways of discovering code, take a look at https://packaging.python.org/guides/creating-and-discovering-plugins/.
The TOML language specification document may be of interest if you’re looking to write a configuration file–based system. https://github.com/toml-lang/toml. A Python implementation is available at https://pypi.org/project/toml/.
Developers using Windows may find the following Microsoft page describing how to manage environment variables in PowerShell to be useful: https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_environment_variables (Linux and macOS users have it easier with NAME=value and echo $NAME).
Some more information on setting up autocomplete for your click-based programs can be found in the Click documentation, at https://click.palletsprojects.com/en/7.x/bashcomplete.

Footnotes

git bisect is one of the single most useful functions of git, it deserves to be much more widely known than it is. If you’re trying to find where a problem was introduced, it will automate performing a binary search on your history. For example, if you have written a new test for a bug that was introduced after version 1.0 and before 1.2 and you want to find the exact commit that introduced it, you could run

> git bisect start

> git bisect bad 1.2

> git bisect good 1.0

> pipenv run git bisect run pytest tests/test_new_bug.py

Of course, the function to resolve a sensor by path is only a placeholder right now, so the value doesn’t really matter.

ValueError would be more appropriate here, but I’m raising RuntimeError to be confident that only the errors I explicitly raise will be captured as user-facing messages. We’ll return to this choice in Chapter 11.

At the time of writing, mypy still has some minor issues with understanding namespace packages. This is why the revealed type is sensors.sensors.Sensor[Any] without the leading apd. and why I put this trivial example in the src/apd/sensors directory. This is unlikely to present a problem in real-world development, but adding the following to setup.cfg can help work around this problem for local development:

[mypy]

namespace_packages = True

mypy_path = src

This explicitly enables looking for namespace packages and declares that the directory src should be in the search path. You can then whitelist missing modules with package-specific config sections to ensure that only modules that you know have no type information are excluded from processing, as follows:

[mypy-psutil]ignore_missing_imports = True

This works on almost all objects, but a few highly optimized objects don’t support it. Specifically, it works for objects defined in Python code that don’t have a __slots__ attribute.

This program is an example of flattening lists (or, in this case, sets) in Python. This is my preferred way of doing this, using a list comprehension to create a list of sets and then the reduce function, which is equivalent to

set.union(set.union(set.union(x[0], x[1]), x[2]), x[3])

for a four-item list called x.

Another way of approaching this is to create an empty set and update it inside a for loop over the entries, like

groups = set()for package in pkg_resources.working_set: groups.update(set(package.get_entry_map(group=None).keys()))

or using the itertools module, with

set(itertools.chain.from_iterable(package.get_entry_map(group=None).keys() for package in pkg_resources.working_set))

Any of these are appropriate; you should use whichever feels more natural to you. There is one other style which is sometimes recommended; in my opinion it is significantly harder to read and should be avoided. That is a list (or set) comprehension where two or more loops form a single comprehension, read from left to right. It would look like this:

{group for package in pkg_resources.working_set for group in package.get_entry_map(group=None).keys()}

That is, isinstance(sensor_class, type) and issubclass(sensor_class, Sensor) and sensor_class != Sensor

TOML is close enough to ini format that it would also be a good choice.

For this to be useful, there would also need to be support code to allow picking a human-readable name for the different instances.

Configuration in files that are listed later will overwrite conflicting configuration from files listed earlier. The ordering should therefore always be from system to user to instance-specific configuration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. From script to framework

Create new playlist

Sign In

Sign Up

4. From script to framework

Writing a sensor plugin

Developing the plugin

Adding a new command option

Subcommands

Command options

Error handling

Off-loading parsing to Click with argument types

Custom click argument types

Canned options

Allowing third-party sensor plugins

Plugin detection using fixed names

Plugin detection using entrypoints

Configuration files

Environment variables

Approach for apd.sensors vs. similar programs

Summary

Additional resources

Table of Contents for
4. From script to framework