6

User Inputs and Outputs

The key purpose of software is to produce useful output. One simple type of output is text displaying some useful result. Python supports this with the print() function.

The input() function has a parallel with the print() function. The input() function reads text from a console, allowing us to provide data to our programs.

There are a number of other common ways to provide input to a program. Parsing the command line is helpful for many applications. We sometimes need to use configuration files to provide useful input. Data files and network connections are yet more ways to provide input. Each of these methods is distinct and needs to be looked at separately. In this chapter, we'll focus on the fundamentals of input() and print().

In this chapter, we'll look at the following recipes:

  • Using the features of the print() function
  • Using input() and getpass() for user input
  • Debugging with f"{value=}" strings
  • Using argparse to get command-line input
  • Using cmd to create command-line applications
  • Using the OS environment settings

It seems best to start with the print() function and show a number of the things it can do. After all, it's often the output from an application that creates the most value.

Using the features of the print() function

In many cases, the print() function is the first function we learn about. The first script is often a variation on the following:

>>> print("Hello, world.")
Hello, world.

The print() function can display multiple values, with helpful spaces between items.

When we write this:

>>> count = 9973
>>> print("Final count", count)
Final count 9973

We see that a space separator is included for us. Additionally, a line break, usually represented by the character, is printed after the values provided in the function.

Can we control this formatting? Can we change the extra characters that are supplied?

It turns out that there are some more things we can do with print().

Getting ready

Consider this spreadsheet, used to record fuel consumption on a large sailboat. It has rows that look like this:

date

engine on

fuel height off

engine off

fuel height off

10/25/13

08:24:00

29

13:15:00

27

10/26/13

09:12:00

27

18:25:00

22

10/28/13

13:21:00

22

06:25:00

14

Example of fuel use by a sailboat

For more information on this data, refer to the Removing items from a set – remove(), pop(), and difference and Slicing and dicing a list recipes in Chapter 4, Built-In Data Structures Part 1: Lists and Sets. Instead of a sensor inside the tank, the depth of fuel is observed through a glass panel on the side of the tank. Knowing the tank is approximately rectangular, with a full depth of about 31 inches and a volume of about 72 gallons, it's possible to convert depth to volume.

Here's an example of using this CSV data. This function reads the file and returns a list of fields built from each row:

from pathlib import Path
import csv
from typing import Dict, List
def get_fuel_use(source_path: Path) -> List[Dict[str, str]]:
    with source_path.open() as source_file:
        rdr = csv.DictReader(source_file)
        return list(rdr)

This example uses a given Path object to identify a file. The opened file is used to create a dictionary-based reader for the CSV file. The list of rows represents the spreadsheet as Python objects.

Here's an example of reading and printing rows from the CSV file:

>>> source_path = Path("data/fuel2.csv")
>>> fuel_use = get_fuel_use(source_path)
>>> for row in fuel_use:
...     print(row)
{'date': '10/25/13', 'engine on': '08:24:00', 'fuel height on': '29', 'engine off': '13:15:00', 'fuel height off': '27'}
{'date': '10/26/13', 'engine on': '09:12:00', 'fuel height on': '27', 'engine off': '18:25:00', 'fuel height off': '22'}
{'date': '10/28/13', 'engine on': '13:21:00', 'fuel height on': '22', 'engine off': '06:25:00', 'fuel height off': '14'}

We used a pathlib.Path object to define the location of the raw data. We evaluated the get_fuel_use() function to open and read the file with the given path. This function creates a list of rows from the source spreadsheet. Each line of data is represented as a Dict[str, str] object.

The output from print(), shown here in long lines, could be seen as challenging for some folks to read. Let's look at how we can improve this output using additional features of the print() function.

How to do it...

We have two ways to control the print() formatting:

  • Set the inter-field separator string, sep, which has the single space character as its default value
  • Set the end-of-line string, end, which has the single character as its default value

We'll show several examples of changing sep and end. The examples are similar.

The default case looks like this. This example has no change to sep or end:

  1. Read the data:
    >>> fuel_use = get_fuel_use(Path("data/fuel2.csv"))
    
  2. For each item in the data, do any useful data conversions:
    >>> for leg in fuel_use:
    ...    start = float(leg["fuel height on"])
    ...    finish = float(leg["fuel height off"])
    
  3. Print labels and fields using the default values of sep and end:
    ...    print("On", leg["date"],
    ...    "from", leg["engine on"],
    ...    "to", leg["engine off"],
    ...    "change", start-finish, "in.")
    On 10/25/13 from 08:24:00 to 13:15:00 change 2.0 in.
    On 10/26/13 from 09:12:00 to 18:25:00 change 5.0 in.
    On 10/28/13 from 13:21:00 to 06:25:00 change 8.0 in.
    

When we look at the output, we can see where a space was inserted between each item. The character at the end of each collection of data items means that each print() function produces a separate line.

When preparing data, we might want to use a format that's similar to CSV, perhaps using a column separator that's not a simple comma. Here's an example using |:

>>> print("date", "start", "end", "depth", sep=" | ")
date | start | end | depth

This is a modification to step 3 of the recipe shown before:

  1. Print labels and fields using a string value of " | " for the sep parameter:
    ...    print(leg["date"], leg["engine on"],
    ...    leg["engine off"], start-finish, sep=" | ")
    10/25/13 | 08:24:00 | 13:15:00 | 2.0 
    10/26/13 | 09:12:00 | 18:25:00 | 5.0 
    10/28/13 | 13:21:00 | 06:25:00 | 8.0 
    

In this case, we can see that each column has the given separator string. Since there were no changes to the end setting, each print() function produces a distinct line of output.

Here's how we might change the default punctuation to emphasize the field name and value.

This is a modification to step 3 of the recipe shown before:

  1. Print labels and fields using a string value of "=" for the sep parameter and ', ' for the end parameter:
    ...    print("date", leg["date"], sep="=", end=", ")
    ...    print("on", leg["engine on"], sep="=", end=", ")
    ...    print("off", leg["engine off"], sep="=", end=", ")
    ...    print("change", start-finish, sep="=")
    date=10/25/13, on=08:24:00, off=13:15:00, change=2.0 
    date=10/26/13, on=09:12:00, off=18:25:00, change=5.0 
    date=10/28/13, on=13:21:00, off=06:25:00, change=8.0 
    

Since the string used at the end of the line was changed to ', ', each use of the print() function no longer produces separate lines. In order to see a proper end of line, the final print() function has a default value for end. We could also have used an argument value of end=' ' to make the presence of the newline character explicit.

How it works...

We can imagine that print() has a definition something like this:

def print_like(*args, sep=None, end=None, file=sys.stdout):
    if sep is None: sep = " "
    if end is None: end = "
"
    arg_iter = iter(args)
    value = next(arg_iter)
    file.write(str(value))
    for value in arg_iter:
        file.write(sep)
        file.write(str(value))
        file.write(end)
    file.flush()

This only has a few of the features of the actual print() function. The purpose is to illustrate how the separator and ending strings work. If no value is provided, the default value for the separator is a single space character, and the default value at end-of-line is a single newline character, " ".

This print-like function creates an explicit iterator object, arg_iter. Using next(arg_iter) allows us to treat the first item as special, since it won't have a separator in front of it. The for statement then iterates through the remaining argument values, inserting the separator string, sep, in front of each item after the first.

The end-of-line string, end, is printed after all of the values. It is always written. We can effectively turn it off by setting it to a zero-length string, "".

Using the print() function's sep and end parameters can get quite complex for anything more sophisticated than these simple examples. Rather than working with a complex sequence of print() function requests, we can use the format() method of a string, or use an f-string.

There's more...

The sys module defines the two standard output files that are always available: sys.stdout and sys.stderr. In the general case, the print() function is a handy wrapper around stdout.write().

We can use the file= keyword argument to write to the standard error file instead of writing to the standard output file:

>>> import sys
>>> print("Red Alert!", file=sys.stderr)

We've imported the sys module so that we have access to the standard error file. We used this to write a message that would not be part of the standard output stream.

Because these two files are always available, using OS file redirection techniques often works out nicely. When our program's primary output is written to sys.stdout, it can be redirected at the OS level. A user might enter a command line like this:

python3 myapp.py <input.dat >output.dat

This will provide the input.dat file as the input to sys.stdin. When a Python program writes to sys.stdout, the output will be redirected by the OS to the output.dat file.

In some cases, we need to open additional files. In that case, we might see programming like this:

>>> from pathlib import Path
>>> target_path = Path("data")/"extra_detail.log"
>>> with target_path.open('w', encoding='utf-8') as target_file:
...     print("Some detailed output", file=target_file)
...     print("Ordinary log")
Ordinary log

In this example, we've opened a specific path for the output and assigned the open file to target_file using the with statement. We can then use this as the file= value in a print() function to write to this file. Because a file is a context manager, leaving the with statement means that the file will be closed properly; all of the OS resources will be released from the application. All file operations should be wrapped in a with statement context to ensure that the resources are properly released.

In large, long-running applications like web servers, the failure to close files and release resources is termed a "leak." A memory leak, for example, can arise when files are not closed properly and buffers remain allocated. Using a with statement assures that resources are released, eliminating a potential source of resource management problems.

See also

  • Refer to the Debugging with "format".format_map(vars()) recipe.
  • For more information on the input data in this example, refer to the Removing items from a set – remove(), pop(), and difference and Slicing and dicing a list recipes in Chapter 4, Built-In Data Structures Part 1: Lists and Sets.
  • For more information on file operations in general, refer to Chapter 8, More Advanced Class Design.

Using input() and getpass() for user input

Some Python scripts depend on gathering input from a user. There are several ways to do this. One popular technique is to use the console to prompt a user for input.

There are two relatively common situations:

  • Ordinary input: We can use the input() function for this. This will provide a helpful echo of the characters being entered.
  • Secure, no echo input: This is often used for passwords. The characters entered aren't displayed, providing a degree of privacy. We use the getpass() function in the getpass module for this.

The input() and getpass() functions are just two implementation choices for reading from the console. It turns out that getting the string of characters is only the first step in gathering valid, useful data. The input also needs to be validated.

When gathering input from a user there are several tiers of considerations for us to make, including the following:

  1. The user interaction: This is the process of writing a prompt and reading input characters from the user.
  2. Validation: The user's input needs to be checked to see whether it belongs in the expected domain of values. We might be looking for digits, yes/no values, or days of the week. In most cases, there are two parts to the validation tier:
    • We check whether the input fits some general domain – for example, numbers.
    • We check whether the input fits some more specific subdomain. For example, this might include a check to see whether the number is greater than or equal to zero, or between zero and six.
  3. Validating the input in some larger context to ensure that it's consistent with other inputs. For example, we can check whether a collection of inputs represents a date, and that the date is prior to today.

Gathering user input isn't a trivial operation. However, Python provides several libraries to help us implement the required tiers of input validation.

Above and beyond these techniques, we'll look at some other approaches in the Using argparse to get command-line input recipe later in this chapter.

Getting ready

We'll look at a technique for reading a complex structure from a person. In this case, we'll use year, month, and day as separate items. These items are then combined to create a complete date.

Here's a quick example of user input that omits all of the validation considerations. This is poor design:

from datetime import date
 
def get_date1() -> date:
    year = int(input("year: "))
    month = int(input("month [1-12]: "))
    day = int(input("day [1-31]: "))
    result = date(year, month, day)
    return result

This illustrates how easy it is to use the input() function. This will behave badly when the user enters an invalid date. Raising an exception for bad data isn't an ideal user experience. The recipe will take a different approach than this example.

We often need to wrap this in additional processing to make it more useful. The calendar is complex, and we'd hate to accept February 31 without warning a user that it is not a proper date.

How to do it...

  1. If the input is a password or something equally subject to redaction, the input() function isn't the best choice. If passwords or other secrets are involved, then use the getpass.getpass() function. This means we need the following import when secrets are involved:
    from getpass import getpass
    

    Otherwise, when secret input is not required, we'll use the built-in input() function, and no additional import is required.

  2. Determine which prompt will be used. In our example, we provided a field name and a hint about the type of data expected as the prompt string argument to the input() or getpass() function. It can help to separate the input from the text-to-integer conversion. This recipe doesn't follow the snippet shown previously; it breaks the operation into two separate steps. First, get the text value:
    year_text = input("year: ")
    
  3. Determine how to validate each item in isolation. The simplest case is a single value with a single rule that covers everything. In more complex cases – like this one – each individual element is a number with a range constraint. In a later step, we'll look at validating the composite item:
    year = int(year_text)
    
  4. Wrap the input and validation into a while-try block that looks like this:
    year = None
    while year is None:
        year_text = input("year: ")
        try:
            year = int(year_text)
        except ValueError as ex:
            print(ex)
    

This applies a single validation rule, the int(year_txt) expression, to ensure that the input is an integer. If the int() function works without raising an exception, the resulting year object is the desired integer. If the int() function raises an exception, this is reported with an error message. The while statement leads to a repeat of the input and conversion sequence of steps until the value of the year variable is not None.

Raising an exception for faulty input allows us the most flexibility. We can extend this with additional exception classes for other conditions the input must meet. In some cases, we may need to define our own unique customized exceptions for data validation.

In some cases, the error message can be printed to sys.stderr instead of sys.stdout. To do this, we could use print(ex, file=sys.stderr). Mixing standard output and standard error may not work out well because the OS-level buffering for these two files is sometimes different, leading to confusing output. It's often a good idea to stick to a single channel.

This processing only covers the year field. We still need to get values for the month and day fields. This means we'll need three nearly identical loops for each of these three fields of a complex date object. Rather than copying and pasting nearly identical code, we need to restructure this input and validate the sequence into a separate function. We'll call the new function get_integer().

Here's the definition:

def get_integer(prompt: str) -> int:
    while True:
        value_text = input(prompt)
        try:
            value = int(value_text)
            return value
        except ValueError as ex:
            print(ex)

This function will use the built-in input() function to prompt the user for input. It uses the int() function to try and create an integer value. If the conversion works, the value is returned. If the conversion raises a ValueError exception, this is displayed to the user and the input is attempted again.

We can combine this into an overall process for getting the three integers of a date. This will involve the same while-try, but applied to the composite object. It will look like this:

def get_date2() -> date:
    while True:
        year = get_integer("year: ")
        month = get_integer("month [1-12]: ")
        day = get_integer("day [1-31]: ")
        try:
            result = date(year, month, day)
            return result
        except ValueError as ex:
            problem = f"invalid, {ex}"

This uses individual while-try processing sequences in the get_integer() function to get the individual values that make up a date. Then it uses the date() constructor to create a date object from the individual fields. If the date object – as a whole – can't be built because the pieces are invalid, then the year, month, and day must be re-entered to create a valid date.

Given a year and a month, we can actually determine a slightly narrower range for the number of days. This is complex because months have different numbers of days, varying from 28 to 31, and February has a number of days that varies with the type of year.

We can compute the starting date of the next month and use a timedelta object to provide the number of days between the two dates:

day_1_date = date(year, month, 1)
if month == 12:
    next_year, next_month = year+1, 1 
else:
    next_year, next_month = year, month+1
day_end_date = date(next_year, next_month, 1)
stop = (day_end_date - day_1_date).days
day = get_integer(f"day [1-{stop}]: ")

This will compute the length of any given month for a given year. The algorithm works by computing the first day of a given year and month. It then computes the first day of the next month (which may be the first month of the next year).

The number of days between these dates is the number of days in the given month. The (day_end_date - day_1_date).days expression extracts the number of days from the timedelta object. This can be used to display a more helpful prompt for the number of days that are valid in a given month.

How it works...

We need to decompose the input problem into several separate but closely related problems. We can imagine a tower of conversion steps. At the bottom layer is the initial interaction with the user. We identified two of the common ways to handle this:

  • input(): This prompts and reads from a user
  • getpass.getpass(): This prompts and reads passwords without an echo

These two functions provide the essential console interaction. There are other libraries that can provide more sophisticated interactions, if that's required. For example, the click project has sophisticated prompting capabilities. See https://click.palletsprojects.com/en/7.x/.

On top of the foundation, we've built several tiers of validation processing. The tiers are as follows:

  • A general domain validation: This uses built-in conversion functions such as int() or float(). These raise ValueError exceptions for invalid text.
  • A subdomain validation: This uses an if statement to determine whether values fit any additional constraints, such as ranges. For consistency, this should also raise a ValueError exception if the data is invalid.
  • Composite object validation: This is application-specific checking. For our example, the composite object was an instance of datetime.date. This also tends to raise ValueError exceptions for dates that are invalid.

There are a lot of potential kinds of constraints that might be imposed on values. For example, we might want only valid OS process IDs, called PIDs. This requires checking the /proc/<pid> path on most Linux systems.

For BSD-based systems such as macOS X, the /proc filesystem doesn't exist. Instead, something like the following needs to be done to determine if a PID is valid:

>>> import subprocess
>>> status = subprocess.run(["ps", str(PID)], check=True, text=True)

For Windows, the command would look like this:

>>> status = subprocess. run(
...     ["tasklist", "/fi", f'"PID eq {PID}"'], check=True, text=True)

Either of these two functions would need to be part of the input validation to ensure that the user is entering a proper PID value. This check can only be made safely when the value of the PID variable is a number.

There's more...

We have several alternatives for user input that involve slightly different approaches. We'll look at these two topics in detail:

  • Complex text: This will involve the simple use of input() with clever parsing of the source text.
  • Interaction via the cmd module: This involves a more complex class and somewhat simpler parsing.

We'll start by looking at ways to process more complex text using more sophisticated parsing.

Complex text parsing

A simple date value requires three separate fields. A more complex date-time that includes a time zone offset from UTC involves seven separate fields: year, month, day, hour, minute, second, and time zone. Prompting for each individual field can be tedious for the person entering all those details. The user experience might be improved by reading and parsing a complex string rather than a large number of individual fields:

def get_date3() -> date:
    while True:
        raw_date_str = input("date [yyyy-mm-dd]: ")
        try:
            input_date = datetime.strptime(
                raw_date_str, "%Y-%m-%d").date()
            return input_date
        except ValueError as ex:
            print(f"invalid date, {ex}")

We've used the strptime() function to parse a time string in a given format. We've emphasized the expected date format in the prompt that's provided in the input() function. The datetime module provides a ValueError exception for data that's not in the right format as well as for non-dates that are in the right format; 2019-2-31, for example, also raises a ValueError exception.

This style of input requires the user to enter a more complex string. Since it's a single string that includes all of the details for a date, many people find it easier to use than a number of individual prompts.

Note that both techniques – gathering individual fields and processing a complex string – depend on the underlying input() function.

Interaction via the cmd module

The cmd module includes the Cmd class, which can be used to build an interactive interface. This takes a dramatically different approach to the notion of user interaction. It does not rely on using input() explicitly.

We'll look at this closely in the Using cmd for creating command-line applications recipe.

See also

In the reference material for the SunOS operating system, which is now owned by Oracle, there is a collection of commands that prompt for different kinds of user inputs:

https://docs.oracle.com/cd/E19683-01/816-0210/6m6nb7m5d/index.html

Specifically, all of these commands beginning with ck are for gathering and validating user input. This could be used to define a module of input validation rules:

  • ckdate: This prompts for and validates a date
  • ckgid: This prompts for and validates a group ID
  • ckint: This displays a prompt, verifies, and returns an integer value
  • ckitem: This builds a menu, prompts for, and returns a menu item
  • ckkeywd: This prompts for and validates a keyword
  • ckpath: This displays a prompt, verifies, and returns a pathname
  • ckrange: This prompts for and validates an integer
  • ckstr: This displays a prompt, verifies, and returns a string answer
  • cktime: This displays a prompt, verifies, and returns a time of day
  • ckuid: This prompts for and validates a user ID
  • ckyorn: This prompts for and validates yes/no

This is a handy summary of the various kinds of user inputs used to support a command-line application. Another list of validation rules can be extracted from JSON schema definitions; this includes None, Boolean, integer, float, and string. A number of common string formats include date-time, time, date, email, hostname, IP addresses in version 4 and version 6 format, and URIs. Another source of user input types can be found in the definition of the HTML5 <input> tag; this includes color, date, datetime-local, email, file, month, number, password, telephone numbers, time, URL, and week-year.

Debugging with f"{value=}" strings

One of the most important debugging and design tools available in Python is the print() function. There are some kinds of formatting options available; we looked at these in the Using features of the print() function recipe.

What if we want more flexible output? We have more flexibility with f"string" formatting.

Getting ready

Let's look at a multistep process that involves some moderately complex calculations. We'll compute the mean and standard deviation of some sample data. Given these values, we'll locate all items that are more than one standard deviation above the mean:

>>> import statistics
>>> size = [2353, 2889, 2195, 3094, 
... 725, 1099, 690, 1207, 926, 
... 758, 615, 521, 1320]
>>> mean_size = statistics.mean(size)
>>> std_size = statistics.stdev(size)
>>> sig1 = round(mean_size + std_size, 1)
>>> [x for x in size if x > sig1]
[2353, 2889, 3094]

This calculation has several working variables. The final list comprehension involves three other variables, mean_size, std_size, and sig1. With so many values used to filter the size list, it's difficult to visualize what's going on. It's often helpful to know the steps in the calculation; showing the values of the intermediate variables can be very helpful.

How to do it...

The f"{name=}" string will have both the literal name= and the value for the name variable. Using this with a print() function looks as follows:

>>> print(
...     f"{mean_size=:.2f}, {std_size=:.2f}"
... )
mean_size=1414.77, std_size=901.10

We can use {name=} to put any variable into the f-string and see the value. These examples in the code above include :.2f as the format specification to show the values rounded to two decimal places. Another common suffix is !r; to show the internal representation of the object, we might use f"{name=!r}".

How it works...

For more background on the formatting options, refer to the Building complex strings with f"strings" recipe in Chapter 1, Numbers, Strings, and Tuples. Python 3.8 extends the basic f-string formatting to introduce the "=" formatting option to display a variable and the value of the variable.

There is a very handy extension to this capability. We can actually use any expression on the left of the "=" option in the f-string. This will show the expression and the value computed by the expression, providing us with even more debugging information.

There's more...

For example, we can use this more flexible format to include additional calculations that aren't simply local variables:

>>> print(
...     f"{mean_size=:.2f}, {std_size=:.2f},"
...     f" {mean_size+2*std_size=:.2f}"
... )
mean_size=1414.77, std_size=901.10, mean_size+2*std_size=3216.97

We've computed a new value, mean_size+2*std_size, that appears only inside the formatted output. This lets us display intermediate computed results without having to create an extra variable.

See also

  • Refer to the Building complex strings with f"strings" recipe in Chapter 1, Numbers, Strings, and Tuples, for more of the things that can be done with the format() method.
  • Refer to the Using features of the print() function recipe earlier in this chapter for other formatting options.

Using argparse to get command-line input

For some applications, it can be better to get the user input from the OS command line without a lot of human interaction. We'd prefer to parse the command-line argument values and either perform the processing or report an error.

For example, at the OS level, we might want to run a program like this:

% python3 ch05_r04.py -r KM 36.12,-86.67 33.94,-118.40
From (36.12, -86.67) to (33.94, -118.4) in KM = 2887.35

The OS prompt is %. We entered a command of python3 ch05_r04.py. This command had an optional argument, -r KM, and two positional arguments of 36.12,-86.67 and 33.94,-118.40.

The program parses the command-line arguments and writes the result back to the console. This allows a very simple kind of user interaction. It keeps the program very simple. It allows the user to write a shell script to invoke the program or merge the program with other Python programs to create a higher-level program.

If the user enters something incorrect, the interaction might look like this:

% python3 ch05_r04.py -r KM 36.12,-86.67 33.94,-118asd
usage: ch05_r04.py [-h] [-r {NM,MI,KM}] p1 p2
ch05_r04.py: error: argument p2: could not convert string to float: '-118asd'

An invalid argument value of -118asd leads to an error message. The program stopped with an error status code. For the most part, the user can hit the up-arrow key to get the previous command line back, make a change, and run the program again. The interaction is delegated to the OS command line.

The name of the program – ch05_r04 – isn't too informative. We could perhaps have chosen a more informative name. The positional arguments are two (latitude, longitude) pairs. The output shows the distance between the two in the given units.

How do we parse argument values from the command line?

Getting ready

The first thing we need to do is to refactor our code to create three separate functions:

  • A function to get the arguments from the command line. To fit well with the argparse module, this function will almost always return an argparse.Namespace object.
  • A function that does the real work. It helps if this function is designed so that it makes no reference to the command-line options in any direct way. The intent is to define a function to be reused in a variety of contexts, one of which is with parameters from the command line.
  • A main function that gathers options and invokes the real work function with the appropriate argument values.

Here's our real work function, display():

from ch03_r05 import haversine, MI, NM, KM
def display(lat1: float, lon1: float, lat2: float, lon2: float, r: str) -> None:
    r_float = {"NM": NM, "KM": KM, "MI": MI}[r]
    d = haversine(lat1, lon1, lat2, lon2, r_float)
    print(f"From {lat1},{lon1} to {lat2},{lon2} in {r} = {d:.2f}")

We've imported the core calculation, haversine(), from another module. We've provided argument values to this function and used an f-string to display the final result message.

We've based this on the calculations shown in the examples in the Picking an order for parameters based on partial functions recipe in Chapter 3, Function Definitions:

The essential calculation yields the central angle, c, between two points. The angle is measured in radians. We convert it into distance by multiplying by the Earth's mean radius in whatever unit we like. If we multiply angle c by a radius of 3,959 miles, the distance, we'll convert the angle to miles.

Note that we expect the distance conversion factor, r, to be provided as a string. This function will then map the string to an actual floating-point value, r_float. The "MI" string, for example, maps to the conversion value from radians to miles, MI, equal to 3,959.

Here's how the function looks when it's used inside Python:

>>> from ch05_r04 import display
>>> display(36.12, -86.67, 33.94, -118.4, 'NM')
From 36.12,-86.67 to 33.94,-118.4 in NM = 1558.53

This function has two important design features. The first feature is it avoids references to features of the argparse.Namespace object that's created by argument parsing. Our goal is to have a function that we can reuse in a number of alternative contexts. We need to keep the input and output elements of the user interface separate.

The second design feature is this function displays a value computed by another function. This is a helpful decomposition of the problem. We've separated the user experience of printed output from the essential calculation. This fits the general design pattern of separating processing into tiers and isolating the presentation tier from the application tier.

How to do it...

  1. Define the overall argument parsing function:
    def get_options(argv: List[str]) -> argparse.Namespace:
    
  2. Create the parser object:
        parser = argparse.ArgumentParser()
    
  3. Add the various types of arguments to the parser object. Sometimes this is difficult because we're still refining the user experience. It's difficult to imagine all the ways in which people will use a program and all of the questions they might have. For our example, we have two mandatory, positional arguments, and an optional argument:
    • Point 1 latitude and longitude
    • Point 2 latitude and longitude
    • Optional units of distance; we'll provide nautical miles as the default:
      parser.add_argument(
          "-u", "--units",
          action="store", choices=("NM", "MI", "KM"), default="NM")
      parser.add_argument(
          "p1", action="store", type=point_type)
      parser.add_argument(
          "p2", action="store", type=point_type)
      options = parser.parse_args(argv)
      

      We've added optional and mandatory arguments. The first is the -u argument, which starts with a single dash, -, to mark it as optional. Additionally, a longer double dash version was added, --units, in this case. These are equivalent, and either can be used on the command line.

      The action of 'store' will store any value that follows the -r option in the command line. We've listed the three possible choices and provided a default. The parser will validate the input and write appropriate errors if the input isn't one of these three values.

      The mandatory arguments are named without a - prefix. These also use an action of 'store'; since this is the default action it doesn't really need to be stated. The function provided as the type argument is used to convert the source string to an appropriate Python object. We'll look at the point_type() validation function in this section.

  4. Evaluate the parse_args() method of the parser object created in step 2:
        options = parser.parse_args(argv)
    

By default, the parser uses the values from sys.argv, which are the command-line argument values entered by the user. Testing is much easier when we can provide an explicit argument value.

Here's the final function:

def get_options(argv: List[str]) -> argparse.Namespace:
    parser = argparse.ArgumentParser()
    parser.add_argument("-r", action="store", 
                        choices=("NM", "MI", "KM"), default="NM")
    parser.add_argument("p1", action="store", type=point_type)
    parser.add_argument("p2", action="store", type=point_type)
    options = parser.parse_args(argv)
    return options

This relies on the point_type() function to both validate the string and convert it to an object of a more useful type. We might use type = int or type = float to convert to a number.

In our example, we used point_type() to convert a string to a (latitude, longitude) two-tuple. Here's the definition of this function:

def point_type(text: str) -> Tuple[float, float]:
    try:
        lat_str, lon_str = text.split(",")
        lat = float(lat_str)
        lon = float(lon_str)
        return lat, lon
    except ValueError as ex:
        raise argparse.ArgumentTypeError(ex)

The point_type() function parses the input values. First, it separates the two values at the , character. It attempts a floating-point conversion on each part. If the float() functions both work, we have a valid latitude and longitude that we can return as a pair of floating-point values.

If anything goes wrong, an exception will be raised. From this exception, we'll raise an ArgumentTypeError exception. This is caught by the argparse module and causes it to report the error to the user.

Here's the main script that combines the option parser and the output display functions:

def main(argv: List[str] = sys.argv[1:]) -> None:
    options = get_options(argv)
    lat_1, lon_1 = options.p1
    lat_2, lon_2 = options.p2
    display(lat_1, lon_1, lat_2, lon_2, options.r)
if __name__ == "__main__":
    main()

This main script connects the user inputs to the displayed output. It does this by parsing the command-line options. Given the values provided by the user, these are decomposed into values required by the display() function, isolating the processing from the input parsing. Let's take a closer look at how argument parsing works.

How it works...

The argument parser works in three stages:

  1. Define the overall context by creating a parser object as an instance of ArgumentParser. We can provide information such as the overall program description. We can also provide a formatter and other options here.
  2. Add individual arguments with the add_argument() method. These can include optional arguments as well as required arguments. Each argument can have a number of features to provide different kinds of syntax. We'll look at a number of the alternatives in the There's more... section.
  3. Parse the actual command-line inputs. The parser's parse() method will use sys.argv automatically. We can provide an explicit value instead of the sys.argv values. The most common reason for providing an override value is to allow more complete unit testing.

Some simple programs will have a few optional arguments. A more complex program may have many optional arguments.

It's common to have a filename as a positional argument. When a program reads one or more files, the filenames are provided in the command line, as follows:

python3 some_program.py *.rst

We've used the Linux shell's globbing feature: the *.rst string is expanded into a list of all files that match the naming rule. This is a feature of the Linux shell, and it happens before the Python interpreter starts. This list of files can be processed using an argument defined as follows:

parser.add_argument('file', nargs='*')

All of the names on the command line that do not start with the - character will be collected into the file value in the object built by the parser.

We can then use the following:

for filename in options.file:
    process(filename)

This will process each file given in the command line.

For Windows programs, the shell doesn't glob filenames from wildcard patterns, and the application must deal with filenames that contain wildcard characters like "*" and "?" in them. The Python glob module can help with this. Also, the pathlib module can create Path objects, which include globbing features.

To support Windows, we might have something like this inside the get_options() function. This will expand file strings into all matching names:

if platform.system() == "Windows":
    options.file = list(
        name
            for wildcard in options.file
                for name in Path().glob(wildcard)
    )

This will expand all of the names in the file parameter to create a new list similar to the list created by the Linux and macOS platforms.

It can be difficult to refer to a file with an asterisk or question mark in the name. For example, a file named something*.py appears to be a pattern for globbing, not a single filename. We can enclose the pattern wildcard character in [] to create a name that matches literally: something[*].py will only match the file named something*.py.

Some applications have very complex argument parsing options. Very complex applications may have dozens of individual commands. As an example, look at the git version control program; this application uses dozens of separate commands, such as git clone, git commit, and git push. Each of these commands has unique argument parsing requirements. We can use argparse to create a complex hierarchy of these commands and their distinct sets of arguments.

There's more...

What kinds of arguments can we process? There are a lot of argument styles in common use. All of these variations are defined using the add_argument() method of a parser:

  • Simple options: The -o or –option arguments are often used to enable or disable features of a program. These are often implemented with add_argument() parameters of action='store_true', default=False. Sometimes the implementation is simpler if the application uses action='store_false', default=True. The choice of default value and stored value may simplify the programming, but it won't change the user's experience.
  • Simple options with non-trivial objects: The user sees this is as simple -o or --option arguments. We may want to implement this using a more complex object that's not a simple Boolean constant. We can use action='store_const', const=some_object, and default=another_object. As modules, classes, and functions are also objects, a great deal of sophistication is available here.
  • Options with values: We showed -r unit as an argument that accepted the string name for the units to use. We implemented this with an action='store' assignment to store the supplied string value. We can also use the type=function option to provide a function that validates or converts the input into a useful form.
  • Options that increment a counter: One common technique is to have a debugging log that has multiple levels of detail. We can use action='count', default=0 to count the number of times a given argument is present. The user can provide -v for verbose output and -vv for very verbose output. The argument parser treats -vv as two instances of the -v argument, which means that the value will increase from the initial value of 0 to 2.
  • Options that accumulate a list: We might have an option for which the user might want to provide more than one value. We could, for example, use a list of distance values. We could have an argument definition with action='append', default=[]. This would allow the user to use -r NM -r KM to get a display in both nautical miles and kilometers. This would require a significant change to the display() function, of course, to handle multiple units in a collection.
  • Show the help text: If we do nothing, then -h and --help will display a help message and exit. This will provide the user with useful information. We can disable this or change the argument string, if we need to. This is a widely used convention, so it seems best to do nothing so that it's a feature of our program.
  • Show the version number: It's common to have –Version as an argument to display the version number and exit. We implement this with add_argument("--Version", action="version", version="v 3.14"). We provide an action of version and an additional keyword argument that sets the version to display.

This covers most of the common cases for command-line argument processing. Generally, we'll try to leverage these common styles of arguments when we write our own applications. If we strive to use simple, widely used argument styles, our users are somewhat more likely to understand how our application works.

There are a few Linux commands that have even more complex command-line syntax. Some Linux programs, such as find or expr, have arguments that can't easily be processed by argparse. For these edge cases, we would need to write our own parser using the values of sys.argv directly.

See also

  • We looked at how to get interactive user input in the Using input() and getpass() for user input recipe.
  • We'll look at a way to add even more flexibility to this in the Using the OS environment settings recipe.

Using cmd to create command-line applications

There are several ways of creating interactive applications. The Using input() and getpass() for user input recipe looked at functions such as input() and getpass.getpass(). The Using argparse to get command-line input recipe showed us how to use argparse to create applications with which a user can interact from the OS command line.

We have a third way to create interactive applications: using the cmd module. This module will prompt the user for input, and then invoke a specific method of the class we provide.

Here's how the interaction will look – we've marked user input like this: "help":

A dice rolling tool. ? for help.
] help
Documented commands (type help <topic>):
========================================
dice  help  reroll  roll
Undocumented commands:
======================
EOF  quit
] help roll
Roll the dice. Use the dice command to set the number of dice.
] help dice
Sets the number of dice to roll.
] dice 5
Rolling 5 dice
] roll
[6, 6, 4, 3, 3]
]  

There's an introductory message from the application with a very short explanation. The application displays a prompt, ]. The user can then enter any of the available commands.

When we enter help as a command, we see a display of the commands. Four of the commands have further details. The other two, EOF and quit, have no further details available.

When we enter help roll, we see a brief summary for the roll command. Similarly, entering help dice displays information about the dice command. We entered the dice 5 command to set the number of dice, and then the roll command showed the results of rolling five dice. This shows the essence of how an interactive command-line application prompts for input, reads commands, evaluates, and prints a result.

Getting ready

The core feature of the cmd.Cmd application is a read-evaluate-print loop (REPL). This kind of application works well when there are a large number of individual state changes and a large number of commands to make those state changes.

We'll make use of a simple, stateful dice game. The idea is to have a handful of dice, some of which can be rolled and some of which are frozen. This means our Cmd class definition will have some attributes that describe the current state of the handful of dice.

We'll define a small domain of commands, to roll and re-roll a handful of dice. The interaction will look like the following:

] roll
[4, 4, 1, 6, 4, 6]
] reroll 2 3 5
[4, 4, 6, 5, 4, 5] (reroll 1)
] reroll 2 3 5
[4, 4, 1, 3, 4, 3] (reroll 2)

In this example, the roll command rolled six dice. The two reroll commands created a hand for a particular game by preserving the dice from positions 0, 1, and 4, and rerolling the dice in positions 2, 3, and 5.

How can we create stateful, interactive applications with an REPL?

How to do it...

  1. Import the cmd module to make the cmd.Cmd class definition available:
    import cmd
    
  2. Define an extension to cmd.Cmd:
    class DiceCLI(cmd.Cmd):
    
  3. Define any initialization required in the preloop() method:
    def preloop(self):
        self.n_dice = 6
        self.dice = None  # no initial roll.
        self.reroll_count = 0
    

    This preloop() method is evaluated just once when the processing starts. The self argument is a requirement for methods within a class. For now, it's a simply required syntax. In Chapter 7, Basics of Classes and Objects, we'll look at this more closely.

    Initialization can also be done in an __init__() method. Doing this is a bit more complex, though, because it must collaborate with the Cmd class initialization. It's easier to do initialization separately in the preloop() method.

  4. For each command, create a do_command() method. The name of the method will be the command, prefixed by do_. The user's input text after the command will be provided as an argument value to the method. The docstring comment in the method definition is the help text for the command. Here are two examples for the roll command and the reroll command:
    def do_roll(self, arg: str) -> bool:
        """Roll the dice. Use the dice command to set the number of dice."""
        self.dice = [random.randint(1, 6) for _ in range(self.n_dice)]
        print(f"{self.dice}")
        return False
    def do_reroll(self, arg: str) -> bool:     """Reroll selected dice. Provide the 0-based positions."""
        try:
            positions = map(int, arg.split())
        except ValueError as ex:
            print(ex)
            return False
        for p in positions:
            self.dice[p] = random.randint(1, 6)
        self.reroll_count += 1
        print(f"{self.dice} (reroll {self.reroll_count})")
        return False
    
  5. Parse and validate the arguments to the commands that use them. The user's input after the command will be provided as the value of the first positional argument to the method. If the arguments are invalid, the method prints a message and returns, making no state change. If the arguments are valid, the method can continue past the validation step. In this example, the only validation is to be sure the number is valid. Additional checks could be added to ensure that the number is in a sensible range:
    def do_dice(self, arg: str) -> bool:
        """Sets the number of dice to roll."""
        try:
            self.n_dice = int(arg)
        except ValueError:
            print(f"{arg!r} is invalid")
            return False
        self.dice = None
        print(f"Rolling {self.n_dice} dice")
        return False
    
  6. Write the main script. This will create an instance of this class and execute the cmdloop() method:
    if __name__ == "__main__":
        game = DiceCLI()
        game.cmdloop()
    

We've created an instance of our DiceCLI subclass of Cmd. When we execute the cmdloop() method, the class will write any introductory messages that have been provided, write the prompt, and read a command.

How it works...

The Cmd module contains a large number of built-in features for displaying a prompt, reading input from a user, and then locating the proper method based on the user's input.

For example, when we enter dice 5, the built-in methods of the Cmd superclass will strip the first word from the input, dice, prefix this with do_, and then evaluate the method that implements the command. The argument value will be the string "5".

If we enter a command for which there's no matching do_ method, the command processor writes an error message. This is done automatically; we don't need to write any code to handle invalid commands.

Some methods, such as do_help(), are already part of the application. These methods will summarize the other do_* methods. When one of our methods has a docstring, this can be displayed by the built-in help feature.

The Cmd class relies on Python's facilities for introspection. An instance of the class can examine the method names to locate all of the methods that start with do_. They're available in a class-level __dict__ attribute. Introspection is an advanced topic, one that will be touched on in Chapter 8, More Advanced Class Design.

There's more...

The Cmd class has a number of additional places where we can add interactive features:

  • We can define specific help_*() methods that become part of the help topics.
  • When any of the do_* methods return a non-False value, the loop will end. We might want to add a do_quit() method that has return True as its body. This will end the command-processing loop.
  • We might provide a method named emptyline() to respond to blank lines. One choice is to do nothing quietly. Another common choice is to have a default action that's taken when the user doesn't enter a command.
  • The default() method is evaluated when the user's input does not match any of the do_* methods. This might be used for more advanced parsing of the input.
  • The postloop() method can be used to do some processing just after the loop finishes. This would be a good place to write a summary. This also requires a do_* method that returns a value – any non-False value – to end the command loop.

Also, there are a number of attributes we can set. These are class-level variables that would be peers of the method definitions:

  • The prompt attribute is the prompt string to write. For our example, we can do the following:
            class DiceCLI(cmd.Cmd):
                prompt="] "
    
  • The intro attribute is the introductory message.
  • We can tailor the help output by setting doc_header, undoc_header, misc_header, and ruler attributes. These will all alter how the help output looks.

The goal is to be able to create a tidy class that handles user interaction in a way that's simple and flexible. This class creates an application that has a lot of features in common with Python's REPL. It also has features in common with many command-line programs that prompt for user input.

One example of these interactive applications is the command-line FTP client in Linux. It has a prompt of <ftp>, and it parses dozens of individual FTP commands. Entering help will show all of the various internal commands that are part of FTP interaction.

See also

  • We'll look at class definitions in Chapter 7, Basics of Classes and Objects, and Chapter 8, More Advanced Class Design.

Using the OS environment settings

There are several ways to look at inputs provided by the users of our software:

  • Interactive input: This is provided by the user on demand, as they interact with the application or service.
  • Command-line arguments: These are provided once, when the program is started.
  • Environment variables: These are OS-level settings. There are several ways these can be set, as shown in the following list:
  • Environment variables can be set at the command line, when the application starts.
  • They can be configured for a user in a configuration file for the user's selected shell. For example, if using zsh, these files are the ~/.zshrc file and the ~/.profile file. There can also be system-wide files, like /etc/zshrc. This makes the values persistent and less interactive than the command line. Other shells offer other filenames for settings and configurations unique to the shell.
  • In Windows, there's the Advanced Settings option, which allows someone to set a long-term configuration.
  • Configuration files: These vary widely by application. The idea is to edit the text configuration file and make these options or arguments available for long periods of time. These might apply to multiple users or even to all users of a given system. Configuration files often have the longest time span.

In the Using input() and getpass() for user input and Using cmd for creating command-line applications recipes, we looked at interaction with the user. In the Using argparse to get command-line input recipe, we looked at how to handle command-line arguments. We'll look at configuration files in Chapter 13, Application Integration: Configuration.

The environment variables are available through the os module. How can we get an application's configuration based on these OS-level settings?

Getting ready

We may want to provide information of various types to a program via OS environment variable settings. There's a profound limitation here: the OS settings can only be string values. This means that many kinds of settings will require some code to parse the value and create proper Python objects from the string.

When we work with argparse to parse command-line arguments, this module can do some data conversions for us. When we use os to process environment variables; we'll have to implement the conversion ourselves.

In the Using argparse to get command-line input recipe, we wrapped the haversine() function in a simple application that parsed command-line arguments.

At the OS level, we created a program that worked like this:

% python3 ch05_r04.py -r KM 36.12,-86.67 33.94,-118.40
From (36.12, -86.67) to (33.94, -118.4) in KM = 2887.35

After using this version of the application for a while, we found that we're often using nautical miles to compute distances from where our boat is anchored. We'd really like to have default values for one of the input points as well as the -r argument.

Since a boat can be anchored in a variety of places, we need to change the default without having to tweak the actual code.

We'll set an OS environment variable, UNITS, with the distance units. We can set another variable, HOME_PORT, with the home point. We want to be able to do the following:

% UNITS=NM
% HOME_PORT=36.842952,-76.300171
% python3 ch05_r06.py 36.12,-86.67
From 36.12,-86.67 to 36.842952,-76.300171 in NM = 502.23

The units and the home point values are provided to the application via the OS environment. This can be set in a configuration file so that we can make easy changes. It can also be set manually, as shown in the example.

How to do it...

  1. Import the os module. The OS environment is available through this module:
    import os
    
  2. Import any other classes or objects needed for the application:
    from Chapter_03.ch03_r08 import haversine, MI, NM, KM
    from Chapter_05.ch05_r04 import point_type, display
    
  3. Define a function that will use the environment values as defaults for optional command-line arguments. The default set of arguments to parse comes from sys.argv, so it's important to also import the sys module:
    def get_options(argv: List[str] = sys.argv[1:]) -> argparse.Namespace:
    
  4. Gather default values from the OS environment settings. This includes any validation required:
    default_units = os.environ.get("UNITS", "KM")
    if default_units not in ("KM", "NM", "MI"):
        sys.exit(f"Invalid UNITS, {default_units!r} not KM, NM, or MI")
    default_home_port = os.environ.get("HOME_PORT")
    

    The sys.exit() function handles the error processing nicely. It will print the message and exit with a non-zero status code.

  5. Create the parser attribute. Provide any default values for the relevant arguments. This depends on the argparse module, which must also be imported:
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-u", "--units",
        action="store", choices=("NM", "MI", "KM"), default=default_units
    )
    parser.add_argument("p1", action="store", type=point_type)
    parser.add_argument(
        "p2", nargs="?", action="store", type=point_type, default=default_home_port
    )
    options = parser.parse_args(argv)
    
  6. Do any additional validation to ensure that arguments are set properly. In this example, it's possible to have no value for HOME_PORT and no value provided for the second command-line argument. This requires an if statement and a call to sys.exit():
    if options.p2 is None:
        sys.exit("Neither HOME_PORT nor p2 argument provided.")
    
  7. Return the options object with the set of valid arguments:
    return options
    

This will allow the -r argument and the second point to be completely optional. The argument parser will use the configuration information to supply default values if these are omitted from the command line.

Use the Using argparse to get command-line input recipe for ways to process the options created by the get_options() function.

How it works...

We've used the OS environment variables to create default values that can be overridden by command-line arguments. If the environment variable is set, that string is provided as the default to the argument definition. If the environment variable is not set, then an application-level default value is used.

In the case of the UNITS variable, in this example, the application uses kilometers as the default if the OS environment variable is not set.

This gives us three tiers of interaction:

  • We can define settings in a configuration file appropriate to the shell in use. For bash it is the .bashrc file; for zsh, it is the .zshrc file. For Windows, we can use the Windows Advanced Settings option to make a change that is persistent. This value will be used each time we log in or create a new command window.
  • We can set the OS environment interactively at the command line. This will last as long as our session lasts. When we log out or close the command window, this value will be lost.
  • We can provide a unique value through the command-line arguments each time the program is run.

Note that there's no built-in or automatic validation of the values retrieved from environment variables. We'll need to validate these strings to ensure that they're meaningful.

Also note that we've repeated the list of valid units in several places. This violates the Don't Repeat Yourself (DRY) principle. A global variable with a valid collection of values is a good improvement to make. (Python lacks formal constants, which are variables that cannot be changed. It's common to treat globals as if they are constants that should not be changed.)

There's more...

The Using argparse to get command-line input recipe shows a slightly different way to handle the default command-line arguments available from sys.argv. The first of the arguments is the name of the Python application being executed and is not often relevant to argument parsing.

he value of sys.argv will be a list of strings:

['ch05_r06.py', '-r', 'NM', '36.12,-86.67']

We have to skip the initial value in sys.argv[0] at some point in the processing. We have two choices:

  • In this recipe, we discard the extra item as late as possible in the parsing process. The first item is skipped when providing sys.argv[1:] to the parser.
  • In the previous example, we discarded the value earlier in the processing. The main() function used options = get_options(sys.argv[1:]) to provide the shorter list to the parser.

Generally, the only relevant distinction between the two approaches is the number and complexity of the unit tests. This recipe will require a unit test that includes an initial argument string, which will be discarded during parsing.

See also

  • We'll look at numerous ways to handle configuration files in Chapter 13, Application Integration: Configuration.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.94.202.151