Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

M. WilkesAdvanced Python Developmenthttps://doi.org/10.1007/978-1-4842-5793-7_1

1. Prototyping and environments

Matthew Wilkes¹

(1)

Leeds, West Yorkshire, UK

In this chapter, we will explore the different ways that you can experiment with what different Python functions do and when is an appropriate time to use those different options. Using one of those methods, we will build some simple functions to extract the first pieces of data that we will be aggregating and see how to build those into a simple command-line tool.

Prototyping in Python

During any Python project, from something that you’ll spend a few hours developing to projects that run for years, you’ll need to prototype functions. It may be the first thing you do, or it may sneak up on you mid-project, but sooner or later, you’ll find yourself in the Python shell trying code out.

There are two broad approaches for how to approach prototyping: either running a piece of code and seeing what the results are or executing statements one at a time and looking at the intermediate results. Generally speaking, executing statements one by one is more productive, but at times it can seem easier to revert to running a block of code if there are chunks you’re already confident in.

The Python shell (also called the REPL for Read, Eval, Print, Loop) is most people’s first introduction to using Python. Being able to launch an interpreter and type commands live is a powerful way of jumping right into coding. It allows you to run commands and immediately see what their result is, then adjust your input without erasing the value of any variables. Compare that to a compiled language, where the development flow is structured around compiling a file and then running the executable. There is a significantly shorter latency for simple programs in interpreted languages like Python.

Prototyping with the REPL

The strength of the REPL is very much in trying out simple code and getting an intuitive understanding of how functions work. It is less suited for cases where there is lots of flow control, as it isn’t very forgiving of errors. If you make an error when typing part of a loop body, you’ll have to start again, rather than just editing the incorrect line. Modifying a variable with a single line of Python code and seeing the output is a close fit to an optimal use of the REPL for prototyping.

For example, I often find it hard to remember how the built-in function filter(...) works. There are a few ways of reminding myself: I could look at the documentation for this function on the Python website or using my code editor/IDE. Alternatively, I could try using it in my code and then check that the values I got out are what I expect, or I could use the REPL to either find a reference to the documentation or just try the function out.

In practice, I generally find myself trying things out. A typical example looks like the following one, where my first attempt has the arguments inverted, the second reminds me that filter returns a custom object rather than a tuple or a list, and the third reminds me that filter includes only elements that match the condition, rather than excluding ones that match the condition.

>>> filter(range(10), lambda x: x == 5)

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

TypeError: 'function' object is not iterable

>>> filter(lambda x: x == 5, range(10))

>>> tuple(filter(lambda x: x == 5, range(10)))

(5,)

Note

The built-in function help(...) is invaluable when trying to understand how functions work. As filter has a clear docstring, it may have been even more straightforward to call help(filter) and read the information. However, when chaining multiple function calls together, especially when trying to understand existing code, being able to experiment with sample data and see how the interactions play out is very helpful.

If we do try to use the REPL for a task involving more flow control, such as the famous interview coding test question FizzBuzz (Listing 1-1), we can see its unforgiving nature.

for num in range(1, 101):

val = ''

if num % 3 == 0:

val += 'Fizz'

if num % 5 == 0:

val += 'Buzz'

if not val:

val = str(num)

print(val)

Listing 1-1

fizzbuzz.py – a typical implementation

If we were to build this up step by step, we might start by creating a loop that outputs the numbers unchanged:

>>> for num in range(1, 101):

... print(num)

...

100

At this point, we will see the numbers 1 to 100 on new lines, so we would start adding logic:

>>> for num in range(1, 101):

... if num % 3 == 0:

... print('Fizz')

... else:

... print(num)

...

Fizz

100

Every time we do this, we are having to reenter code that we entered before, sometimes with small changes, sometimes verbatim. These lines are not editable once they’ve been entered, so any typos mean that the whole loop needs to be retyped.

You may decide to prototype the body of the loop rather than the whole loop, to make it easier to follow the action of the conditions. In this example, the values of n from 1 to 14 are correctly generated with a three-way if statement, with n=15 being the first to be incorrectly rendered. While this is in the middle of a loop body, it is difficult to examine the way the conditions interact.

This is where you’ll find the first of the differences between the REPL and a script’s interpretation of indenting. The Python interpreter has a stricter interpretation of how indenting should work when in REPL mode than when executing a script, requiring you to have a blank line after any unindent that returns you to an indent level of 0.

>>> num = 15

>>> if num % 3 == 0:

... print('Fizz')

... if num % 5 == 0:

File "<stdin>", line 3

if num % 5 == 0:

SyntaxError: invalid syntax

In addition, the REPL only allows a blank line when returning to an indent level of 0, whereas in a Python file it is treated as an implicit continuation of the last indent level. Listing 1-2 (which differs from Listing 1-1 only in the addition of blank lines) works when invoked as python fizzbuzz_blank_lines.py.

for num in range(1, 101):

val = ''

if num % 3 == 0:

val += 'Fizz'

if num % 5 == 0:

val += 'Buzz'

if not val:

val = str(num)

print(val)

Listing 1-2

fizzbuzz_blank_lines.py

However, typing the contents of Listing 1-2 into a Python interpreter results in the following errors, due to the differences in indent parsing rules:

>>> for num in range(1, 101):

... val = ''

... if num % 3 == 0:

... val += 'Fizz'

... if num % 5 == 0:

... val += 'Buzz'

...

>>> if not val:

File "<stdin>", line 1

if not val:

IndentationError: unexpected indent

>>> val = str(num)

File "<stdin>", line 1

val = str(num)

IndentationError: unexpected indent

>>>

>>> print(val)

File "<stdin>", line 1

print(val)

IndentationError: unexpected indent

It’s easy to make a mistake when using the REPL to prototype a loop or condition when you’re used to writing Python in files. The frustration of making a mistake and having to reenter the code is enough to undo the time savings of using this method over a simple script. While it is possible to scroll back to previous lines you entered using the arrow keys, multiline constructs such as loops are not grouped together, making it very difficult to re-run a loop body. The use of the >>> and ... prompts throughout the session also makes it difficult to copy and paste previous lines, either to re-run them or to integrate them into a file.

Prototyping with a Python script

It is very much possible to prototype code by writing a simple Python script and running it until it returns the correct result. Unlike using the REPL, this ensures that it is easy to re-run code if you make a mistake, and code is stored in a file rather than in your terminal’s scrollback buffer.¹ Unfortunately, it does mean that it is not possible to interact with the code while it’s running, leading to this being nicknamed “printf debugging,” after C’s function to print a variable.

As the nickname implies, the only practical way to get information from the execution of the script is to use the print(...) function to log data to the console window. In our example, it would be common to add a print to the loop body to see what is happening for each iteration:

Tip

f-strings are useful for printf debugging, as they let you interpolate variables into a string without additional string formatting operations.

for num in range(1,101):

print(f"n: {num} n%3: {num%3} n%5: {num%5}")

The following is the result:

n: 1 n%3: 1 n%5: 1

n: 98 n%3: 2 n%5: 3

n: 99 n%3: 0 n%5: 4

n: 100 n%3: 1 n%5: 0

This provides an easily understood view at what the script is doing, but it does require some repetition of logic. This repetition makes it easier for errors to be missed, which can cause significant losses of time. The fact that the code is stored permanently is the biggest advantage this has over the REPL, but it provides a poorer user experience for the programmer. Typos and simple errors can become frustrating as there is a necessary context switch from editing the file to running it in the terminal.² It can also be more difficult to see the information you need at a glance, depending on how you structure your print statements. Despite these flaws, its simplicity makes it very easy to add debugging statements to an existing system, so this is one of the most commonly used approaches to debugging, especially when trying to get a broad understanding of a problem.

Prototyping with scripts and pdb

pdb, the built-in Python debugger, is the single most useful tool in any Python developer’s arsenal. It is the most effective way to debug complex pieces of code and is practically the only way of examining what a Python script is doing inside multistage expressions like list comprehensions.³

In many ways, prototyping code is a specialized form of debugging. We know that the code we’ve written is incomplete and contains errors, but rather than trying to find a single flaw, we’re trying to build up complexity in stages. Many of pdb’s features to assist in debugging make this easier.

When you start a pdb session, you see a (Pdb) prompt that allows you to control the debugger. The most important commands, in my view, are step, next, break, continue, prettyprint, and debug.⁴

Both step and next execute the current statement and move to the next one. They differ in what they consider the “next” statement to be. Step moves to the next statement regardless of where it is, so if the current line contains a function call, the next line is the first line of that function. Next does not move execution into that function; it considers the next statement to be the following statement in the current function. If you want to examine what a function call is doing, then step into it. If you trust that the function is doing the right thing, use next to gloss over its implementation and get the result.

break and continue allow for longer portions of the code to run without direct examination. break is used to specify a line number where you want to be returned to the pdb prompt, with an optional condition that is evaluated in that scope, for example, break 20 x==1. The continue command returns to the normal flow of execution; you won’t be returned to a pdb prompt unless you hit another breakpoint.

Tip

If you find visual status displays more natural, you may find it hard to keep track of where you are in a debugging session. I would recommend you install the pdb++ debugger which shows a code listing with the current line highlighted. IDEs, such as PyCharm, go one step further by allowing you to set breakpoints in a running program and control stepping directly from your editor window.

Finally, debug allows you to specify any arbitrary python expression to step into. This lets you call any function with any data from within a pdb prompt, which can be very useful if you’ve already used next or continue to pass a point before you realize that’s where the error was. It is invoked as debug somefunction() and modifies the (Pdb) prompt to let you know that you’re in a nested pdb session by adding an extra pair of parentheses, making the prompt ((Pdb)).⁵

Post-mortem debugging

There are two common ways of invoking pdb, either explicitly in the code or directly for so-called “post-mortem debugging.” Post-mortem debugging starts a script in pdb and will trigger pdb if an exception is raised. It is run through the use of python -m pdb yourscript.py rather than python yourscript.py. The script will not start automatically; you’ll be shown a pdb prompt to allow you to set breakpoints. To begin execution of the script, you should use the continue command. You will be returned to the pdb prompt either when a breakpoint that you set is triggered or when the program terminates. If the program terminates because of an error, it allows you to view the variables that were set at the time the error occurred.

Alternatively, you can use step commands to run the statements in the file one by one; however, for all but the simplest of scripts, it is better to set a breakpoint at the point you want to start debugging and step from there.

The following is the result of running Listing 1-1 in pdb and setting a conditional breakpoint (output abbreviated):

> python -m pdb fizzbuzz.py

> c:fizzbuzz_pdb.py(1)<module>()

-> def fizzbuzz(num):

(Pdb) break 2, num==15

Breakpoint 1 at c:fizzbuzz.py:2

(Pdb) continue

> c:fizzbuzz.py(2)fizzbuzz()

-> val = ''

(Pdb) p num

This style works well when combined with the previous script-based approach. It allows you to set arbitrary breakpoints at stages of the code’s execution and automatically provides a pdb prompt if your code triggers an exception without you needing to know in advance what errors occur and where.

The breakpoint function

The breakpoint() built-in⁶ allows you to specify exactly where in a program pdb takes control. When this function is called, execution immediately stops, and a pdb prompt is shown . It behaves as if a pdb breakpoint had previously been set at the current location. It’s common to use breakpoint() inside an if statement or in an exception handler, to mimic the conditional breakpoint and post-mortem debugging styles of invoking pdb prompts. Although it does mean changing the source code (and therefore is not suitable for debugging production-only issues), it removes the need to set up your breakpoints every time you run the program.

Debugging the fizzbuzz script at the point of calculating the value of 15 would be done by adding a new condition to look for num == 15 and putting breakpoint() in the body, as shown in Listing 1-3.

for num in range(1, 101):

val = ''

if num == 15:

breakpoint()

if num % 3 == 0:

val += 'Fizz'

if num % 5 == 0:

val += 'Buzz'

if not val:

val = str(num)

print(val)

Listing 1-3

fizzbuzz_with_breakpoint.py

To use this style when prototyping, create a simple Python file that contains imports you think you might need and any test data you know you have. Then, add a breakpoint() call at the bottom of the file. Whenever you execute that file, you’ll find yourself in an interactive environment with all the functions and data you need available.

Tip

I strongly recommend the library remote-pdb for debugging complex multithreaded applications. To use this, install the remote-pdb package and start your application with the environment variable PYTHONBREAKPOINT =remote_pdb.set_trace python yourscript.py. When you call breakpoint() in your code, the connection information is logged to the console. See the remote-pdb documentation for more options.

Prototyping with Jupyter

Jupyter is a suite of tools for interacting with languages that support a REPL in a more user-friendly way. It has extensive support for making it easier to interact with the code, such as displaying widgets that are bound to the input or output of functions, which makes it much easier for nontechnical people to interact with complex functions. The functionality that’s useful to us at this stage is the fact that it allows breaking code into logical blocks and running them independently as well as being able to save those blocks and return to them later.

Jupyter is written in Python but as a common front end for the Julia, Python, and R programming languages. It is intended as a vehicle for sharing self-contained programs that offer simple user interfaces, for example, for data analysis. Many Python programmers create Jupyter notebooks rather than console scripts, especially those who work in the sciences. We’re not using Jupyter in that way for this chapter; we’re using it because its features happen to align well with prototyping tasks.

The design goal of supporting multiple languages means it also supports Haskell, Lua, Perl, PHP, Rust, Node.js, as well as many others. Each of these languages has IDEs, REPLs, documentation websites, and so on. One of the most significant advantages of using Jupyter for this type of prototyping is that it allows you to develop a workflow that also works with unfamiliar environments and languages. For example, full-stack web programmers often have to work on both Python and JavaScript code. In contrast, scientists may need easy access to both Python and R. Having a single interface means that some of the differences between languages are smoothed over.

As Jupyter is not Python-specific and has built-in support for selecting what back end to use to run the current code, I recommend installing it in such a way that it’s conveniently available across your whole system. If you generally install Python utilities into a virtual environment, that’s fine.⁷ However, I have installed Jupyter into my user environment:

> python -m pip install --user jupyter

Note

As Jupyter has been installed in user mode, you need to ensure that the binaries directory is included in your system path. Installing into the global python environment or through your package manager is an acceptable alternative; it’s more important to be consistent with how your tools are installed than to use a variety of methods.

When prototyping with Jupyter, you can separate our code into logical blocks that you can run either individually or sequentially. The blocks are editable and persistent, as if we were using a script, but we can control which blocks run and write new code without discarding the contents of variables. In that way, it is similar to using the REPL, as we can try things out without any interruption from the coding flow to run a script.

There are two main ways of accessing the Jupyter tools, either through the Web using Jupyter’s notebook server or as a replacement for the standard Python REPL . Each works on the idea of cells, which are independent units of execution that can be re-run at any time. Both the notebook and the REPL use the same underlying interface to Python, called IPython. IPython has none of the trouble understanding indenting that the standard REPL does and has support for easily re-running code from earlier in a session.

The notebook is more user-friendly than the shell but has the disadvantage of only being accessible through a web browser rather than your usual text editor or IDE.⁸ I strongly recommend using the notebook interface as it provides a significant boost to your productivity through the more intuitive interface when it comes to being able to re-run cells and to edit multiline cells.

Notebooks

To begin prototyping, start the Jupyter notebook server and then create a new notebook using the web interface.

> jupyter notebook

Once the notebook has loaded, enter the code into the first cell, then click the run button. Many keyboard shortcuts that are common to code editors are present, along with automatic indenting when a new block is begun (Figure 1-1).

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig1_HTML.jpg — Figure 1-1
fizzbuzz in a Jupyter notebook

Pdb works with Jupyter notebooks through the web interface, interrupting execution and displaying a new input prompt (Figure 1-2), in the same way that it does in the command line. All the standard pdb functionality is exposed through this interface, so the tips from the pdb section of this chapter can also be used in a Jupyter environment.

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig2_HTML.jpg — Figure 1-2
pdb in a Jupyter notebook

Prototyping in this chapter

There are advantages and disadvantages to all the methods we’ve explored, but each has its place. For very simple one-liners, such as list comprehensions, I often use the REPL, as it’s the fastest to start up and has no complex control flow that would be hard to debug.

For more complex tasks, such as bringing functions from external libraries together and doing multiple things with them, a more featureful approach is usually more efficient. I encourage you to try different approaches when prototyping things to understand where the sweet spot is in terms of convenience and your personal preferences.

The various features of the different methods should go a long way to making it clear which one is best for your particular use case. As a general rule, I’d suggest using the leftmost entry in Table 1-1 that meets your requirements for the features you want to have available. Using something further to the right may be less convenient; using something too far to the left may mean you get frustrated trying to perform tasks that are easier in other tools.

Table 1-1

Comparison of prototyping environments

Feature	REPL	Script	Script + pdb	Jupyter	Jupyter + pdb
Indenting code	Strict rules	Normal rules	Normal rules	Normal rules	Normal rules
Re-running previous commands	Single typed line	Entire script only	Entire script or jump to the previous line	Logical blocks	Logical blocks
Stepping	Indented blocks run as one	The entire script runs as one	Step through statements	Logical blocks run as one	Step through statements
Introspection	Can introspect between logical blocks	No introspection	Can introspect between statements	Can introspect between logical blocks	Can introspect between statements
Persistence	Nothing is saved	Commands are saved	Commands are saved, interactions at the pdb prompt are not	Commands and output are saved	Commands and output are saved
Editing	Commands must be reentered	Any command can be edited, but the whole script must be re-run	Any command can be edited, but the whole script must be re-run	Any command can be edited, but the logical block must be re-run	Any command can be edited, but the logical block must be re-run

In this chapter, we will be prototyping a few different functions that return data about the system they’re running on. They will depend on some external libraries, and we may need to use some simple loops, but not extensively.

As we’re unlikely to have complex control structures, the indenting code feature isn’t a concern. Re-running previous commands will be useful as we’re dealing with multiple different data sources. It’s possible that some of these data sources will be slow, so we don’t want to be forced to always re-run every data source command when working on one of them. That discounts the REPL and is a closer fit for Jupyter than the script-based processes.

We want to be able to introspect the results of each data source, but we are unlikely to need to introspect the internal variables of individual data sources, which suggests the pdb-based approaches are not necessary (and, if that changes, we can always add in a breakpoint() call). We will want to store the code we’re writing, but that only discounts the REPL which has already been discounted. Finally, we want to be able to edit code and see the difference it makes.

If we compare these requirements to Table 1-1, we can create Table 1-2, which shows that the Jupyter approach covers all of the features we need well, whereas the script approach is good enough but not quite optimal in terms of ability to re-run previous commands.

For that reason, in this chapter we will be using a Jupyter notebook to do our prototyping. Throughout the rest of the chapter, we will cover some other advantages that Jupyter affords us, as well as some techniques for using it effectively as part of a Python development process, rather than to create stand-alone software distributed as a notebook.

Table 1-2

Matrix of whether the features of the various approaches match our requirements⁹

Feature	REPL	Script	Script + pdb	Jupyter	Jupyter + pdb
Indenting code	✔	✔	✔	✔	✔
Re-running previous commands	❌	⚠	⚠	✔	✔
Stepping	❌	❌	⚠	✔	⚠
Introspection	✔	✔	✔	✔	✔
Persistence	❌	✔	✔	✔	✔
Editing	❌	✔	✔	✔	✔

Environment setup

That said, we need to install libraries and manage dependencies for this project, which means that we need a virtual environment. We specify our dependencies using pipenv, a tool that handles both the creation of isolated virtual environments and excellent dependency management.

> python -m pip install --user pipenv

Why Pipenv

There has been a long history of systems to create isolated environments in Python. The one you’ll most likely have used before is called virtualenv . You may also have used venv, conda, buildout, virtualenvwrapper, or pyenv. You may even have created your own by manipulating sys.path or creating lnk files in Python’s internal directories.

Each of these methods has positives and negatives (except for the manual method, for which I can think of only negatives), but pipenv has excellent support for managing direct dependencies while keeping track of a full set of dependency versions that are known to work correctly and ensuring that your environment is kept up to date. That makes it a good fit for modern pure Python projects. If you’ve got a workflow that involves building binaries or working with outdated packages, then sticking with the existing workflow may be a better fit for you than migrating it to pipenv. In particular, if you’re using Anaconda because you do scientific computing, there’s no need to switch to pipenv. If you wish, you can use pipenv --site-packages to make pipenv include the packages that are managed through conda as well as its own.

Pipenv’s development cycle is rather long, as compared to other Python tools. It’s not uncommon for it to go months or years without a release. In general, I’ve found pipenv to be stable and reliable, which is why I’m recommending it. Package managers that have more frequent releases sometimes outstay their welcome, forcing you to respond to breaking changes regularly.

For pipenv to work effectively, it does require that the maintainers of packages you’re declaring a dependency on correctly declare their dependencies. Some packages do not do this well, for example, by specifying only a dependency package without any version restrictions when restrictions exist. This problem can happen, for example, because a new major release of a subdependency has recently been released. In these cases, you can add your own restrictions on what versions you’ll accept (called a version pin).

If you find yourself in a situation where a package is missing a required version pin, please consider contacting the package maintainers to alert them. Open source maintainers are often very busy and may not yet have noticed the issue – don’t assume that just because they’re experienced that they don’t need your help. Most Python packages have repositories on GitHub with an issue tracker. You see from the issue tracker if anyone else has reported the problem yet, and if not, it is an easy way to contribute to the packages that are easing your development tasks.

Setting up a new project

First, create a new directory for this project and change to it. We want to declare ipykernel as a development dependency. This package contains the code to manage an interface between Python and Jupyter, and we want to ensure that it and its library code is available within our new, isolated environment.

> mkdir advancedpython

> cd advancedpython

> pipenv install ipykernel --dev

> pipenv run ipython kernel install --user --name=advancedpython

The final line here instructs the copy of IPython within the isolated environment to install itself as an available kernel for the current user account, with the name advancedpython. This allows us to select the kernel without having to activate this isolated environment manually each time. Installed kernels can be listed with jupyter kernelspec list and removed with jupyter kernelspec remove.

Now we can start Jupyter and see options to run code against our system Python or our isolated environment. I recommend opening a new command window for this, as Jupyter runs in the foreground and we will need to use the command line again shortly. If you have a Jupyter server open from earlier in this chapter, I’d recommend stopping that one before opening the new one. We want to use the working directory we created previously, so change to that directory if the new window isn’t already there.

> cd advancedpython

> jupyter notebook

A web browser automatically opens and displays the Jupyter interface with a directory listing of the directory we created. This will look like Figure 1-3. With the project set up, it’s time to start prototyping. Choose “New” and then “advancedpython”.

We now see the main editing interface for a notebook. We have one “cell” that contains nothing and has not been executed. Any code we type into the cell can be run by clicking the “Run” button just above. Jupyter displays the output of the cell underneath, as well as a new empty cell for further code. You should think of a cell as being approximately equal to a function body. They generally contain multiple related statements which you want to run as a logical group.

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig3_HTML.jpg — Figure 1-3
The Jupyter home screen in a new pipenv directory

Prototyping our scripts

A logical first step is to create a Python program that returns various information about the system it is running on. Later on, these pieces of information will be part of the data that’s aggregated, but for now some simple data is an appropriate first objective.

In the spirit of starting small, we’ll use the first cell for finding the version of Python we are running, shown in Figure 1-4. As this is exposed by the Python standard library and works on all platforms, it is a good placeholder for something more interesting.

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig4_HTML.jpg — Figure 1-4
A simple Jupyter notebook showing sys.version_info

Jupyter shows the value of the last line of the cell, as well as anything explicitly printed. As the last line of our cell is sys.version_info, that is what is shown in the output.¹⁰

Another useful piece of information to aggregate is the current machine’s IP address. This isn’t exposed in a single variable; it’s the result of a few API calls and processing of information. As this requires more than a simple import, it makes sense to build up the variables step by step in new cells. When doing so, you can see at a glance what you got from the previous call, and you have those variables available in the next cell. This step-by-step process allows you to concentrate on the new parts of the code you’re writing, ignoring the parts you’ve completed.

By the end of this process, you will have something similar to the code in Figure 1-5, showing the various IP addresses associated with the current computer. At the second stage, it became apparent that there were both IPv4 and IPv6 addresses available. This makes the third stage slightly more complex, as I decided to extract the type of address along with the actual value. By performing these steps individually, we can adapt to things we learn in one when writing the next. Being able to re-run the loop body individually without changing window is a good example of where Jupyter’s strengths lie in prototyping.

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig5_HTML.jpg — Figure 1-5
Prototyping a complex function in multiple cells¹¹

At this point, we have three cells to find the IP addresses, meaning there’s no one-to-one mapping between cells and logical components. To tidy this up, select the top cell and select “Merge Cell Below” from the edit menu. Do this twice to merge both additional cells, and the full implementation is now stored as a single logical block (Figure 1-6). This operation can now be run as a whole, rather than all three cells needing to have been run to produce the output. It is a good idea to tidy the contents of this cell up, too: as we no longer want to print the intermediate values, we can remove the duplicate addresses line.

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig6_HTML.jpg — Figure 1-6
The result of merging the cells from Figure 1-5

Installing dependencies

A more useful thing to know would be how much load the system is experiencing. In Linux, this can be found by reading the values stored in /proc/loadavg. In macOS this is sysctl -n vm.loadavg. Both systems also include it in the output of other programs, such as uptime, but this is such a common task that there is undoubtedly a library that can help us. We don’t want to add any complexity if we can avoid it.

We’re going to install our first dependency, psutil . As this is an actual dependency of our code, not a development tool that we happen to want available, we should omit the --dev flag we used when installing dependencies earlier:

> pipenv install psutil

Note

We have no preferences about which version of psutil is needed, so we have not specified a version. The install command adds the dependency to Pipfile and the particular version that is picked to Pipfile.lock . Files with the extension .lock are often added to the ignore set in version control. You should make an exception for Pipfile.lock as it helps when reconstructing old environments and performing repeatable deployments.

When we return to the notebook, we need to restart the kernel to ensure the new dependency is available. Click the Kernel menu, then restart. If you prefer keyboard shortcuts , you can press <ESCAPE> to exit editing mode (the green highlight for your current cell will turn blue to confirm) and press 0 (zero) twice.

With that done, we can start to explore the psutils module. In the second cell, import psutil:

import psutil

and click Run (or, <SHIFT+ENTER> to run the cell from the keyboard). In a new cell, type psutil.cpu<TAB>.¹² You’ll see the members of psutil that jupyter can autocomplete for you. In this case, cpu_stats appears to be a good option, so type that out. At this point, you can press <SHIFT+TAB> to see minimal documentation on cpu_stats, which tells us that it doesn’t require any arguments.

Finish the line, so the cells now read:

import psutil

psutil.cpu_stats()

When we run the second cell, we see that cpu_stats gives us rather opaque information on the operating system’s internal use of the CPU. Let’s try cpu_percent instead. Using <SHIFT+TAB> on this function, we see that it takes two optional parameters. The interval parameter determines how long the function takes before it returns and works best if it’s nonzero. For that reason, we’ll modify the code as follows and get a simple floating-point number between 0 and 100:

import psutil

psutil.cpu_percent(interval=0.1)

Exercise 1-1: Explore the Library

Numerous other functions in the psutil library make good sources of data, so let’s create a cell for each function that looks interesting. There are different functions available on different operating systems, so be aware that if you’re following this tutorial on Windows, you have a slightly more limited choice of functions.

Try the autocomplete and help functions of Jupyter to get a feel for what information you find useful and create at least one more cell that returns data.

Including psutil’s import in each cell would be repetitive and not good practice for a Python file, but we do want to make sure it’s easy to run a single function in isolation. To solve this, we’ll move the imports to a new top cell, which is the equivalent of the module scope in a standard Python file.

Once you’ve created additional cells for your data sources, your notebook will look something like Figure 1-7.

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig7_HTML.jpg — Figure 1-7
An example of a complete notebook following the exercise

While you’ve been doing this, the numbers in square brackets next to the cell have been increasing. This number is the sequence of operations that have been run. The number next to the first cell has stayed constant, meaning this cell hasn’t been run while we’ve experimented with the lower one.

In the Cell menu, there is an option to Run All, which will run each cell in sequence like a standard Python file. While it’s useful to be able to run all cells to test the entire notebook, being able to run each cell individually lets you split out complex and slow logic from what you’re working on without having to re-run it each time.

To demonstrate how this could be useful, we’ll modify our use of the cpu_percent function . We picked an interval of 0.1 as it’s enough to get accurate data. A larger interval, while less realistic, helps us see how Jupyter allows us to write expensive setup code while still allowing us to re-run faster parts without waiting for the slow ones.

import psutil

psutil.cpu_percent(interval=5)

Exporting to a .py file

Although Jupyter has served us well as a prototyping tool, it’s not a good match for the main body of our project. We want a traditional Python application, and the great presentation features of Jupyter aren’t useful right now. Jupyter has built-in support for exporting notebooks in a variety of formats, from slideshows to HTML, but the one we’re interested in is Python scripts.

The script to do the conversion is part of the Jupyter command, using the nbconvert (notebook convert) subcommand.¹³

> jupyter nbconvert --to script Untitled.ipynb

The untitled notebook we created is left unchanged, and a new Untitled.py file (Listing 1-4) is generated. If you renamed your notebook, then the names match the name you assigned. If you didn’t, and want to rename it now as you hadn’t noticed that it was just called Untitled.ipynb previously, click “Untitled” at the top of the notebook view and enter a new title.

#!/usr/bin/env python

# coding: utf-8

# In[1]:

import sys

sys.version_info

# In[4]:

import socket

hostname = socket.gethostname()

addresses = socket.getaddrinfo(hostname, None)

for address in addresses:

print(address[0].name, address[4][0])

# In[5]:

import psutil

# In[6]:

psutil.cpu_percent()

# In[7]:

psutil.virtual_memory().available

# In[8]:

psutil.sensors_battery().power_plugged

# In[ ]:

Listing 1-4

Untitled.py, generated from the preceding notebook

As you can see, each cell is separated from the others with comments, and the standard boilerplate around text encoding and shebang is present at the top of the file. Starting the prototyping in Jupyter rather than directly in a Python script or in the REPR hasn’t cost us anything in terms of flexibility or time; rather it gave us more control over how we executed the individual blocks of code while we were exploring.

We can now tidy this up to be a utility script rather than bare statements by moving the imports to the top of the file and converting each cell into a named function. The # In comments that show where cells started are useful reminders as to where a function should start. We also have to convert the code to return the value, not just leave it at the end of the function (or print it, in the case of the IP addresses). The result is Listing 1-5.

# coding: utf-8

import sys

import socket

import psutil

def python_version():

return sys.version_info

def ip_addresses():

hostname = socket.gethostname()

addresses = socket.getaddrinfo(hostname, None)

address_info = []

for address in addresses:

address_info.append(address[0].name, address[4][0])

return address_info

def cpu_load():

return psutil.cpu_percent()

def ram_available():

return psutil.virtual_memory().available

def ac_connected():

return psutil.sensors_battery().power_plugged

Listing 1-5

serverstatus.py

Building a command-line interface

These functions alone are not especially useful, most only each wrap an existing Python function. The obvious thing we want to do is to print their data, so you may wonder why we’ve gone to the trouble of creating single-line wrapper functions. This will be more obvious as we create more complex data sources and multiple ways of consuming them, as we will benefit from not having special-cased the simplest ones. For now, to make these useful, we can give users a simple command-line application that displays this data.

As we are working with a bare Python script rather than something installable, we use an idiom commonly called “ifmain ”. This is built into many coding text editors and IDEs as a snippet as it’s hard to remember and very unintuitive. It looks like this:

def do_something():

print("Do something")

if __name__ == '__main__':

do_something()

It really is quite horrid. The __name__ ¹⁴ variable is a reference to the fully qualified name of a module. If you import a module, the __name__ attribute will be the location from which it can be imported.

>>> from json import encoder

>>> type(encoder)

>>> encoder.__name__

'json.encoder'

However, if you load code through an interactive session or by providing a path to a script to run, then it can’t necessarily be imported. Such modules, therefore, get the special name "__main__". The ifmain trick is used to detect if that is the case. That is, if the module has been specified on the command line as the file to run, then the contents of the block will execute. The code inside this block will not execute when the module is imported by other code because the __name__ variable would be set to the name of the module instead. Without this guard in place, the command-line handler would execute whenever this module is imported, making it take over any program that uses these utility functions.

Caution

As the contents of the ifmain block can only be run if the module is the entrypoint into the application, you should be careful to keep it as short as possible. Generally, it’s a good idea to limit it to a single statement that calls a utility function. This allows that function call to be testable and is required for some of the techniques we will be looking at in the next chapter.

The sys module and argv

Most programming languages expose a variable named argv, which represents the name of the program and the arguments that the user passed on invocation. In Python, this is a list of strings where the first entry is the name of the Python script (but not the location of the Python interpreter) and any arguments listed after that.

Without checking the argv variable, we can only produce very basic scripts. Users expect a command-line flag that provides help information about the tool. Also, all but the simplest of programs need to allow users to pass configuration variables in from the command line.

The simplest way of doing this is to check the values that are present in sys.argv and handle them in conditionals. Implementing a help flag might look like Listing 1-6.

#!/usr/bin/env python

# coding: utf-8

import socket

import sys

import psutil

HELP_TEXT = """usage: python {program_name:s}

Displays the values of the sensors

Options and arguments:

--help: Display this message"""

def python_version():

return sys.version_info

def ip_addresses():

hostname = socket.gethostname()

addresses = socket.getaddrinfo(socket.gethostname(), None)

address_info = []

for address in addresses:

address_info.append((address[0].name, address[4][0]))

return address_info

def cpu_load():

return psutil.cpu_percent(interval=0.1)

def ram_available():

return psutil.virtual_memory().available

def ac_connected():

return psutil.sensors_battery().power_plugged

def show_sensors():

print("Python version: {0.major}.{0.minor}".format(python_version()))

for address in ip_addresses():

print("IP addresses: {0[1]} ({0[0]})".format(address))

print("CPU Load: {:.1f}".format(cpu_load()))

print("RAM Available: {} MiB".format(ram_available() / 1024**2))

print("AC Connected: {}".format(ac_connected()))

def command_line(argv):

program_name, *arguments = argv

if not arguments:

show_sensors()

elif arguments and arguments[0] == '--help':

print(HELP_TEXT.format(program_name=program_name))

return

else:

raise ValueError("Unknown arguments {}".format(arguments))

if __name__ == '__main__':

command_line(sys.argv)

Listing 1-6

sensors_argv.py – cli using manual checking of argv

The command_line(...) function is not overly complicated, but this is a very simple program. You can easily imagine situations where there are multiple flags allowed in any order and configurable variables being significantly more complex. This is only practically possible because there is no ordering or parsing of values involved. Some helper functionality is available in the standard library to make it easier to create more involved command-line utilities.

argparse

The argparse module is the standard method for parsing command-line arguments without depending on external libraries. It makes handling the complex situations alluded to earlier significantly less complicated; however, as with many libraries that offer developers choices, its interface is rather difficult to remember. Unless you’re writing command-line utilities regularly, it’s likely to be something that you read the documentation of every time you need to use it.

The model that argparse follows is that the programmer creates an explicit parser by instantiating argparse.ArgumentParser with some basic information about the program, then calling functions on that parser to add new options. Those functions specify what the option is called, what the help text is, any default values, as well as how the parser should handle it. For example, some arguments are simple flags, like --dry-run; others are additive, like -v, -vv, and -vvv; and yet others take an explicit value, like --config config.ini.

We aren’t using any parameters in our program just yet, so we skip over adding these options and have the parser parse the arguments from sys.argv . The result of that function call is the information it has gleaned from the user. Some basic handling is also done at this stage, such as handling --help, which displays an autogenerated help screen based on the options that were added.

Our command-line program looks like Listing 1-7, when written using argparse.

#!/usr/bin/env python

# coding: utf-8

import argparse

import socket

import sys

import psutil

def python_version():

return sys.version_info

def ip_addresses():

hostname = socket.gethostname()

addresses = socket.getaddrinfo(socket.gethostname(), None)

address_info = []

for address in addresses:

address_info.append((address[0].name, address[4][0]))

return address_info

def cpu_load():

return psutil.cpu_percent(interval=0.1)

def ram_available():

return psutil.virtual_memory().available

def ac_connected():

return psutil.sensors_battery().power_plugged

def show_sensors():

print("Python version: {0.major}.{0.minor}".format(python_version()))

for address in ip_addresses():

print("IP addresses: {0[1]} ({0[0]})".format(address))

print("CPU Load: {:.1f}".format(cpu_load()))

print("RAM Available: {} MiB".format(ram_available() / 1024**2))

print("AC Connected: {}".format(ac_connected()))

def command_line(argv):

parser = argparse.ArgumentParser(

description='Displays the values of the sensors',

add_help=True,

)

arguments = parser.parse_args()

show_sensors()

if __name__ == '__main__':

command_line(sys.argv)

Listing 1-7

sensors_argparse.py – cli using the standard library module argparse

click

Click is an add-on module that simplifies the process of creating command-line interfaces on the assumption that your interface is broadly similar to the standard that people expect. It makes for a significantly more natural flow when creating command-line interfaces and encourages you toward intuitive interfaces.

Whereas argparse requires the programmer to specify the options that are available when constructing a parser, click uses decorators on methods to infer what the parameters should be. This approach is a little less flexible, but easily handles 80% of typical use cases. If you’re writing a command-line interface, you generally want to follow the lead of other tools, so it is intuitive for the end-user.

As click isn’t in the standard library, we need to install it into our environment. Like psutil, click is a code dependency, not a development tool, so we install it as follows:

> pipenv install click

As we only have one primary command and no options, click only requires two lines of code to be added, an import and the @click.command(...) decorator. The print(...) calls should all be replaced with click.echo(...), but this isn’t strictly required. The result is shown as Listing 1-8. click.echo is a helper function that behaves like print, but also handles situations where there is a mismatch in character encodings, and intelligently strips or retains color and formatting markers depending on the capabilities of the terminal that called the program and whether the output is being piped elsewhere.

#!/usr/bin/env python

# coding: utf-8

import socket

import sys

import click

import psutil

def python_version():

return sys.version_info

def ip_addresses():

hostname = socket.gethostname()

addresses = socket.getaddrinfo(socket.gethostname(), None)

address_info = []

for address in addresses:

address_info.append((address[0].name, address[4][0]))

return address_info

def cpu_load():

return psutil.cpu_percent(interval=0.1)

def ram_available():

return psutil.virtual_memory().available

def ac_connected():

return psutil.sensors_battery().power_plugged

@click.command(help="Displays the values of the sensors")

def show_sensors():

click.echo("Python version: {0.major}.{0.minor}".format(python_version()))

for address in ip_addresses():

click.echo("IP addresses: {0[1]} ({0[0]})".format(address))

click.echo("CPU Load: {:.1f}".format(cpu_load()))

click.echo("RAM Available: {} MiB".format(ram_available() / 1024**2))

click.echo("AC Connected: {}".format(ac_connected()))

if __name__ == '__main__':

show_sensors()

Listing 1-8

sensors_click.py – cli using the contributed library click

It also has many utility functions which make creating more complex interfaces easier and compensate for nonstandard terminal environment on end-user systems. For example, if we decided to make the headers bold in the show_sensors command , in click we can use the secho (...) command, for echoing to the terminal with styling information. A version that styles headings is shown as Listing 1-9.

@click.command(help="Displays the values of the sensors")

def show_sensors():

click.secho("Python version: ", bold=True, nl=False)

click.echo("{0.major}.{0.minor}".format(python_version()))

for address in ip_addresses():

click.secho("IP addresses: ", bold=True, nl=False)

click.echo("{0[1]} ({0[0]})".format(address))

click.secho("CPU Load: ", bold=True, nl=False)

click.echo("{:.1f}".format(cpu_load()))

click.secho("RAM Available: ", bold=True, nl=False)

click.echo("{} MiB".format(ram_available() / 1024**2))

click.secho("AC Connected: ", bold=True, nl=False)

click.echo("{}".format(ac_connected()))

Listing 1-9

Extract from sensors_click_bold.py

The secho(...) function prints some information to the screen with the formatting specified . The nl= argument allows us to specify if a new line should be printed or not. If you’re not using click, the simplest method would be

BOLD = '33[1m'

END = '33[0m'

def show_sensors():

print(BOLD + "Python version:" + END + " ({0.major}.{0.minor})".format(python_version()))

for address in ip_addresses():

print(BOLD + "IP addresses: " + END + "{0[1]} ({0[0]})".format(address))

print(BOLD + "CPU Load:" + END + " {:.1f}".format(cpu_load()))

print(BOLD + "RAM Available:" + END + "{} MiB".format(ram_available() / 1024**2))

print(BOLD + "AC Connected:" + END + " {}".format(ac_connected()))

Click also provides transparent support for autocomplete in terminals and a number of other useful functions. We will revisit these later in the book when we expand on this interface.

Pushing the boundaries

We’ve looked at using Jupyter and IPython for doing prototyping, but sometimes we need to run prototype code on a specific computer, rather than the one we’re using for day-to-day development work. This could be because the computer has a peripheral or some software we need, for example.

This is mainly a matter of comfort; editing and running code on a remote machine can vary from slightly inconvenient to outright difficult, especially when there are differences in the operating system.

In the preceding examples, we’ve run all the code locally. However, we are planning to run the final code on a Raspberry Pi as that’s where we attach our specialized sensors. As an embedded system, it has significant hardware differences, both in terms of performance and peripherals.

Remote kernels

Testing this code would require running a Jupyter environment on a Raspberry Pi and connecting to that over HTTP or else connecting over SSH and interacting with the Python interpreter manually. This is suboptimal, as it requires ensuring that the Raspberry Pi has open ports for Jupyter to bind to and requires manually synchronizing the contents of notebooks between the local and remote hosts using a tool like scp. This is even more of a problem with real-world examples. It’s hard to imagine opening a port on a server and connecting to Jupyter there to test log analysis code.

Instead, it is possible to use the pluggable kernel infrastructure of Jupyter and IPython to connect a locally running Jupyter notebook to one of many remote computers. This allows testing of the same code transparently on multiple machines and with minimal manual work.

When Jupyter displays its list of potential execution targets, it is listing its list of known kernel specifications. When a kernel specification has been selected, an instance of that kernel is created and linked to the notebook. It is possible to connect to remote machines and manually start an individual kernel for your local Jupyter instance to connect to. However, this is rarely an effective use of time. When we ran pipenv run ipython kernel install at the start of this chapter, we were creating a new kernel specification for the current environment and installing that into the list of known kernel specifications.

To add kernel specifications that use remote hosts, we can use the helper utility remote_ikernel. We should install this to the same location as Jupyter, as it is a helper for Jupyter rather than a specific development tool for this environment.

> pip install --user remote_ikernel

We then need to set up the environment and kernel helper program on the remote host. Connect to the Raspberry Pi (or another machine that we want to send code to) and create a pipenv on that computer as we did earlier:

rpi> python -m pip install --user pipenv

rpi> mkdir development-testing

rpi> cd development-testing

rpi> pipenv install ipykernel

Tip

Some low-performance hosts, such as Raspberry Pis, may make installing ipython_kernel frustratingly slow. In this case, you may consider using the package manager’s version of ipython_kernel instead. The ipython kernel does require many support libraries which may take some time to install on a low-powered computer. In that case, you could set up the environment as

rpi> sudo apt install python3-ipykernel

rpi> pipenv --three --site-packages

Alternatively, if y0ou are using the Raspberry Pi, there is a repository of precompiled wheels at https://www.piwheels.org which can be enabled by adding the following new source to your Pipfile, in addition to the existing one:

[[source]]

url = "https://www.piwheels.org/simple"

name = "piwheels"

verify_ssl = true

You would then install the ipython_kernel package as normal using pipenv install. If you’re using a Raspberry Pi running Raspbian, you should always add piwheels to your Pipfile, as Raspbian comes preconfigured to use PiWheels globally. Not listing it in your Pipfile can cause installations to fail.

This will install the IPython kernel program on the Raspberry Pi machine; however, we still need to install it on our host machine. To start with, we’ll install a kernel pointing at the pipenv environment that we’ve created. After this, the Raspberry Pi will have two kernels available, one for the system Python install and one called development-testing for our environment. After installing the kernel, we can view the configuration file for the specification:

rpi> pipenv run ipython kernel install --user --name=development-testing

Installed kernelspec development-testing in /home/pi/.local/share/jupyter/kernels/development-testing

> cat /home/pi/.local/share/jupyter/kernels/development-testing/kernel.json

{

"argv": [

"/home/pi/.local/share/virtualenvs/development-testing-nbi70cWI/bin/python",

"-m",

"ipykernel_launcher",

"-f",

"{connection_file}"

"display_name": "development-testing",

"language": "python"

}

This output shows us how Jupyter would run the kernel if it were installed on that computer. We can use the information from this specification to create a new remote_ikernel specification on our development machine that points at the same environment as the development-testing kernel on the Raspberry Pi.

The preceding kernel specification lists how the kernel is started on the Raspberry Pi. We can verify this by testing the command over SSH to the Raspberry Pi, for example, by changing -f {connection_file} to --help to show the help text.

rpi> /home/pi/.local/share/virtualenvs/development-testing-nbi70cWI/bin/python -m ipykernel –help

We can now return to our development computer and create the remote kernel specification, as follows:

> remote_ikernel manage --add --kernel_cmd="/home/pi/.local/share/virtualenvs/development-testing-nbi70cWI/bin/python

-m ipykernel_launcher -f {connection_file}"

--name="development-testing" --interface=ssh --host=pi@raspberrypi --workdir="/home/pi/developmenttesting" --language=python

It looks a bit intimidating, spanning five lines of text, but it can be broken up:

The --kernel_cmd parameter is the contents of the argv section from the kernel spec file. Each line is space separated and without the individual quotation marks. This is the command that starts the kernel itself.
The --name parameter is the equivalent of display_name from the original kernel spec. This is what will be shown in Jupyter when you select this kernel, alongside SSH information. It doesn’t have to match the remote kernel’s name that you’ve copied from, it’s just for your reference.
The --interface and --host parameters define how to connect to the remote machine. You should ensure that passwordless¹⁵ SSH is possible to this machine so that Jupyter can set up connections.
The --workdir parameter is the default working directory that the environment should use. I recommend setting this to be the directory that contains your remote Pipfile.
The --language parameter is the value of the language value from the original kernel spec to differentiate different programming languages.

Tip

If you’re having difficulty connecting to the remote kernel, you can try opening a shell using Jupyter on the command line. This often shows useful error messages. Find the name of the kernel using jupyter kernelspec list and then use that with jupyter console:

> jupyter kernelspec list

Available kernels:

advancedpython C:UsersmicroAppDataRoamingjupyterkernelsadvancedpython

rik_ssh_pi_raspberrypi_developmenttesting C:UsersmicroAppDataRoamingjupyterkernels ik_ssh_pi_raspberrypi_developmenttesting

> jupyter console --kernel= rik_ssh_pi_raspberrypi_developmenttesting

In [1]:

At this point, when we reenter the Jupyter environment, we see a new kernel available that matches the connection information we supplied. We can then select that kernel and execute commands that require that environment,¹⁶ with the Jupyter kernel system taking care of connecting to the Raspberry Pi and activating the environment in ~/development-testing.

Developing code that cannot be run locally

There are some useful sensors available on the Raspberry Pi; these provide the actual data that we are interested in collecting. In other use cases, this might be information gathered by calling custom command-line utilities, introspecting a database, or making local API calls.

This isn’t a book on how to make the most of the Raspberry Pi, so we will gloss over much of the detail of exactly how it does its work, but suffice it to say that there is a large amount of documentation and support for doing exciting things using Python. In this case, there is a library that we want to use that provides a function to retrieve both temperature and relative humidity from a sensor that can be added to the board. Like many other tasks, this is relatively slow (it can take a good portion of a second to measure) and requires a specific environment (an external sensor being installed) to execute. In that way, it is similar to monitoring active processes on a web server by communicating through their management ports.

To begin with, we add the Adafruit DHT¹⁷ library to our environment. We have currently got copies of a pipfile both on the Raspberry Pi and locally. The remote copy only contains the dependency on ipykernel, which is already in the local copy, so it’s safe to overwrite the remote file with the one we’ve created locally. As we know that the DHT library is only useful on Raspberry Pis, we can restrict it so that it only installed on Linux machines with ARM processors, using the conditional dependency syntax:¹⁸

> pipenv install "Adafruit-CircuitPython-DHT ; 'arm' in platform_machine"

This results in the Pipfile and Pipfile.lock files being updated to include this dependency. We want to make use of these dependencies on the remote host, so we must copy these files across and install them using Pipenv. It would be possible to run this command in both environments, but that risks mistakes creeping in. Pipenv assumes that you use the same version of Python for both development and deployment, in keeping with its philosophy of avoiding problems during deployment. For that reason, if you’re planning to deploy to a set version of Python, you should use that for development locally.

However, if you do not want to install unusual versions of Python in your local environment, or if you’re targeting multiple different machines, it is possible to deactivate this check. To do so, remove the python_version line from the end of your Pipfile. This allows your environment to be deployed to any Python version. However, you should ensure that you’re aware of what versions you need to support and test accordingly.

Copy both Pipfile and Pipfile.lock files to the remote host using scp (or your tool of choice), and then on the remote machine, run pipenv install with the --deploy flag. --deploy instructs pipenv only to proceed if the exact versions match, which is very useful for deploying a known-good environment from one machine to another.

rpi> cd /home/pi/development-testing

rpi> pipenv install --deploy

Be aware, however, that if you’ve created your Pipfile on a different operating system or a different CPU architecture (such as files created on a standard laptop and installed on a Raspberry Pi), it is possible that the pinned packages will not be suitable when deploying them on another machine. In this case, it is possible to relock the dependencies without triggering version upgrades by running pipenv lock --keep-outdated.

You now have the specified dependencies available in the remote environment. If you’ve relocked the files, you should transfer the changed lock file back and store it, so you can redeploy in future without having to regenerate this file. At this stage, you can connect to the remote server through your Jupyter client and begin prototyping. We’re looking to add the humidity sensor, so we’ll use the library we just added and can now receive a valid humidity percentage.

The Raspberry Pi that I copied these files to has a DHT22 sensor connected to pin D4, as demonstrated in Figure 1-8. This sensor is readily available from Raspberry Pi or general electronics suppliers. If you don’t have one to hand, then try an alternative command that demonstrates that the code is running on the Pi, such as platform.uname().

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig8_HTML.jpg — Figure 1-8
Jupyter connected to a remote Raspberry Pi

This notebook is stored locally on your development machine, not on the remote server. It can be migrated into being a Python script using nbconvert , in the same way as before. However, before we do that, we can also change the kernel back to our local instance to check that the code behaves correctly there. The objective is to create code that works on both environments, returning either the humidity or a placeholder value.

../images/481001_1_En_1_Chapter/481001_1_En_1_Fig9_HTML.jpg — Figure 1-9
Demonstration of the same code being run on the local machine

Figure 1-9 demonstrates that the code is not suitable for all environments. We would very much like to be able to run at least some of the code locally, so we can adjust our code to take the limitations of other platforms into account. When this has been converted to the more general function form, it will look something like

def get_relative_humidity():

try:

# Connect to a DHT22 sensor on GPIO pin 4

from adafruit_dht import DHT22

from board import D4

except (ImportError, NotImplementedError):

# No DHT library results in an ImportError.

# Running on an unknown platform results in a NotImplementedError

# when getting the pin

return None

return DHT22(D4).humidity

This allows for the function to be called on any machine, unless it has a temperature and humidity sensor connected to pin D4 and to return a None anywhere else.

The completed script

Listing 1-10 shows the completed script. There are still hurdles to overcome to ensure that this is a useful library, most notably the fact that the show_sensors function is doing formatting of the values. We don’t want to integrate formatting into the data sources at this point as we want to be sure the raw values are available to other interfaces. This is something we will look at in a later chapter.

#!/usr/bin/env python

# coding: utf-8

import socket

import sys

import click

import psutil

def python_version():

return sys.version_info

def ip_addresses():

hostname = socket.gethostname()

addresses = socket.getaddrinfo(socket.gethostname(), None)

address_info = []

for address in addresses:

address_info.append((address[0].name, address[4][0]))

return address_info

def cpu_load():

return psutil.cpu_percent(interval=0.1) / 100.0

def ram_available():

return psutil.virtual_memory().available

def ac_connected():

return psutil.sensors_battery().power_plugged

def get_relative_humidity():

try:

# Connect to a DHT22 sensor on GPIO pin 4

from adafruit_dht import DHT22

from board import D4

except (ImportError, NotImplementedError):

# No DHT library results in an ImportError.

# Running on an unknown platform results in a NotImplementedError

# when getting the pin

return None

return DHT22(D4).humidity

@click.command(help="Displays the values of the sensors")

def show_sensors():

click.echo("Python version: {0.major}.{0.minor}".format(python_version()))

for address in ip_addresses():

click.echo("IP addresses: {0[1]} ({0[0]})".format(address))

click.echo("CPU Load: {:.1%}".format(cpu_load()))

click.echo("RAM Available: {:.0f} MiB".format(ram_available() / 1024**2))

click.echo("AC Connected: {!r}".format(ac_connected()))

click.echo("Humidity: {!r}".format(get_relative_humidity()))

if __name__ == '__main__':

show_sensors()

Listing 1-10

The final version of our script from this chapter

Summary

This concludes the chapter on prototyping; in the following chapters, we will build on the data extraction functions we’ve created here to create libraries and tools that follow Python best practice. We have followed the path from playing around with a library up to the point of having a working shell script that has genuine utility. As we continue, it will develop to better fit our end goal of distributed data aggregation.

The tips we’ve covered here can be useful at many points in the software development life cycle, but it’s important not to be inflexible and only follow a process. While these methods are effective, sometimes opening the REPL or using pdb (or even plain print(...) calls) will be more straightforward than setting up a remote kernel. It is not possible to pick the best way of approaching a problem unless you’re aware of the options.

To recap:

1.
Jupyter is an excellent tool for exploring libraries and doing initial prototyping of their use.
2.
There are special-purpose debuggers available for Python that can be integrated easily into your workflow using the breakpoint() function and environment variables.
3.
Pipenv helps you to define version requirements that are kept up to date, involve minimal specification, and facilitate reproducible builds.
4.
The library click allows for simple command-line interfaces in an idiomatic Python style.
5.
Jupyter’s kernel system allows for seamless integration of multiple programming languages running both locally and on other computers into a single development flow.

Additional resources

Each of the tools we’ve used in this chapter has a lot of depth to it, while we’ve only skimmed the surface to achieve our ends.

The Pipenv documentation at https://pipenv.pypa.io/en/latest/ has a lot of useful explanation on customizing pipenv to work as you want, specifically with regard to customizing virtual environment creation and integration into existing processes. If you’re new to pipenv but have used virtual environments a lot, then this has good documentation to help you bridge the gap.
If you’re interested in prototyping other programming languages in Jupyter, I’d recommend you read through the Jupyter documentation at https://jupyter.readthedocs.io/en/latest/ – especially the kernels section.
For information on Raspberry Pis and compatible sensors, I recommend the CircuitPython project’s documentation on the Raspberry Pi: https://learn.adafruit.com/circuitpython-on-raspberrypi-linux.

Footnotes

You’ll be glad of this the first time you accidentally close the window and lose the code you’re working on.

Some text editors integrate a terminal precisely to cut down on this kind of context switching.

Pdb allows you to step through each iteration of a list comprehension, as you would do with a loop. This is useful when you have existing code that you are trying to diagnose a problem with, but frustrating when the list comprehension is incidental to your debugging.

These can all be abbreviated, as shown in bold. step becomes s, prettyprint becomes pp, etc.

I once so badly misunderstood a bug that I overused debug until the pdb prompt looked like ((((((Pdb)))))). This is an antipattern as it’s very easy to accidentally lose your place; try and use conditional breakpoints if you find yourself in a similar situation.

You may find documentation recommending import pdb; pdb.set_trace(). This is an older style that is still quite common, but does the same thing albeit without some of the configurability and less readable.

In fact, many people prefer to create a virtual environment just for Jupyter and add that to the system path, to avoid any risk of version conflicts in their global namespace.

Some editors, such as the professional version of PyCharm IDE and Microsoft’s VSCode editor, have begun to offer a partial equivalent to the notebook interface from within the IDE. They don’t have all the functionality available, but it’s surprisingly good.

✔ indicates that our requirements are met, ❌ indicates that they are not, and ⚠ represents that our requirements are met, but with a poor user experience.

This means that if your cell ends with an assignment, it won’t show the value being assigned. This is because assignments in Python do not evaluate to a variable. It’s common to explicitly show this with version = sys.version_info

version

While you could also use Python 3.8’s “walrus” operator, (version := sys.version_info), as that does evaluate to the value being assigned, it looks rather strange so I recommend against using it for a stand-alone assignment. This operator is best used in the condition of loops and if statements, where it looks a lot more natural as the parentheses are not required in such cases.

Part of the world-routable IPv6 address has been censored in these screenshots.

This shortcut only works if the variable is available to the kernel, so you may find you have to run the cell that defines it before you can use the autocompletion. If you’re overwriting the same variable name with different data, then you may see the wrong information, but I’d recommend against doing this where possible as it can be confusing.

IDEs and editors that offer notebook compatibility usually have a feature to do this from within the editor window, too.

This is usually pronounced “dunder main” for “double underscore” as saying “underscore” four times adds 12 syllables and feels silly.

Use ssh-copy-id user@host to set this up automatically, rather than manually editing the authorized_hosts file.

If you prefer a console environment to the web environment of the Jupyter notebook, you can see a list of available kernels using jupyter kernelspec list and open an IPython shell connected to the specification of your choice with jupyter console --kernel kernelname.

This is part of Adafruit’s excellent CircuitPython ecosystem. They have a lot more information on these sensors and how to use them in a variety of projects at https://learn.adafruit.com/dht

This is defined by PEP508 at www.python.org/dev/peps/pep-0508/. There is a table on that page which lists the valid filters, although more may be added in future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 1. Prototyping and environments

Create new playlist

Sign In

Sign Up

1. Prototyping and environments

Prototyping in Python

Prototyping with the REPL

Prototyping with a Python script

Prototyping with scripts and pdb

Post-mortem debugging

The breakpoint function

Prototyping with Jupyter

Notebooks

Prototyping in this chapter

Environment setup

Setting up a new project

Prototyping our scripts

Installing dependencies

Exporting to a .py file

Building a command-line interface

The sys module and argv

argparse

click

Pushing the boundaries

Remote kernels

Developing code that cannot be run locally

The completed script

Summary

Additional resources

Table of Contents for
1. Prototyping and environments