The key purpose of software is to produce useful output. One simple type of output is text displaying some useful result. Python supports this with the print()
function.
The input()
function has a parallel with the print()
function. The input()
function reads text from a console, allowing us to provide data to our programs.
There are a number of other common ways to provide input to a program. Parsing the command line is helpful for many applications. We sometimes need to use configuration files to provide useful input. Data files and network connections are yet more ways to provide input. Each of these methods is distinct and needs to be looked at separately. In this chapter, we'll focus on the fundamentals of input()
and print()
.
In this chapter, we'll look at the following recipes:
print()
functioninput()
and getpass()
for user inputf"{value=}"
stringsargparse
to get command-line inputcmd
to create command-line applicationsIt seems best to start with the print()
function and show a number of the things it can do. After all, it's often the output from an application that creates the most value.
In many cases, the print()
function is the first function we learn about. The first script is often a variation on the following:
>>> print("Hello, world.")
Hello, world.
The print()
function can display multiple values, with helpful spaces between items.
When we write this:
>>> count = 9973
>>> print("Final count", count)
Final count 9973
We see that a space separator is included for us. Additionally, a line break, usually represented by the
character, is printed after the values provided in the function.
Can we control this formatting? Can we change the extra characters that are supplied?
It turns out that there are some more things we can do with print()
.
Consider this spreadsheet, used to record fuel consumption on a large sailboat. It has rows that look like this:
date |
engine on |
fuel height off |
engine off |
fuel height off |
10/25/13 |
08:24:00 |
29 |
13:15:00 |
27 |
10/26/13 |
09:12:00 |
27 |
18:25:00 |
22 |
10/28/13 |
13:21:00 |
22 |
06:25:00 |
14 |
Example of fuel use by a sailboat
For more information on this data, refer to the Removing items from a set – remove(), pop(), and difference and Slicing and dicing a list recipes in Chapter 4, Built-In Data Structures Part 1: Lists and Sets. Instead of a sensor inside the tank, the depth of fuel is observed through a glass panel on the side of the tank. Knowing the tank is approximately rectangular, with a full depth of about 31 inches and a volume of about 72 gallons, it's possible to convert depth to volume.
Here's an example of using this CSV data. This function reads the file and returns a list of fields built from each row:
from pathlib import Path
import csv
from typing import Dict, List
def get_fuel_use(source_path: Path) -> List[Dict[str, str]]:
with source_path.open() as source_file:
rdr = csv.DictReader(source_file)
return list(rdr)
This example uses a given Path
object to identify a file. The opened file is used to create a dictionary-based reader for the CSV
file. The list of rows represents the spreadsheet as Python objects.
Here's an example of reading and printing rows from the CSV
file:
>>> source_path = Path("data/fuel2.csv")
>>> fuel_use = get_fuel_use(source_path)
>>> for row in fuel_use:
... print(row)
{'date': '10/25/13', 'engine on': '08:24:00', 'fuel height on': '29', 'engine off': '13:15:00', 'fuel height off': '27'}
{'date': '10/26/13', 'engine on': '09:12:00', 'fuel height on': '27', 'engine off': '18:25:00', 'fuel height off': '22'}
{'date': '10/28/13', 'engine on': '13:21:00', 'fuel height on': '22', 'engine off': '06:25:00', 'fuel height off': '14'}
We used a pathlib.Path
object to define the location of the raw data. We evaluated the get_fuel_use()
function to open and read the file with the given path. This function creates a list of rows from the source spreadsheet. Each line of data is represented as a Dict[str, str]
object.
The output from print()
, shown here in long lines, could be seen as challenging for some folks to read. Let's look at how we can improve this output using additional features of the print()
function.
We have two ways to control the print()
formatting:
sep
, which has the single space character as its default valueend
, which has the single
character as its default valueWe'll show several examples of changing sep
and end
. The examples are similar.
The default case looks like this. This example has no change to sep
or end
:
>>> fuel_use = get_fuel_use(Path("data/fuel2.csv"))
>>> for leg in fuel_use:
... start = float(leg["fuel height on"])
... finish = float(leg["fuel height off"])
sep
and end
:
... print("On", leg["date"],
... "from", leg["engine on"],
... "to", leg["engine off"],
... "change", start-finish, "in.")
On 10/25/13 from 08:24:00 to 13:15:00 change 2.0 in.
On 10/26/13 from 09:12:00 to 18:25:00 change 5.0 in.
On 10/28/13 from 13:21:00 to 06:25:00 change 8.0 in.
When we look at the output, we can see where a space was inserted between each item. The
character at the end of each collection of data items means that each print()
function produces a separate line.
When preparing data, we might want to use a format that's similar to CSV, perhaps using a column separator that's not a simple comma. Here's an example using |
:
>>> print("date", "start", "end", "depth", sep=" | ")
date | start | end | depth
This is a modification to step 3 of the recipe shown before:
" | "
for the sep
parameter:
... print(leg["date"], leg["engine on"],
... leg["engine off"], start-finish, sep=" | ")
10/25/13 | 08:24:00 | 13:15:00 | 2.0
10/26/13 | 09:12:00 | 18:25:00 | 5.0
10/28/13 | 13:21:00 | 06:25:00 | 8.0
In this case, we can see that each column has the given separator string. Since there were no changes to the end
setting, each print()
function produces a distinct line of output.
Here's how we might change the default punctuation to emphasize the field name and value.
This is a modification to step 3 of the recipe shown before:
"="
for the sep
parameter and ', '
for the end
parameter:
... print("date", leg["date"], sep="=", end=", ")
... print("on", leg["engine on"], sep="=", end=", ")
... print("off", leg["engine off"], sep="=", end=", ")
... print("change", start-finish, sep="=")
date=10/25/13, on=08:24:00, off=13:15:00, change=2.0
date=10/26/13, on=09:12:00, off=18:25:00, change=5.0
date=10/28/13, on=13:21:00, off=06:25:00, change=8.0
Since the string used at the end of the line was changed to ', '
, each use of the print()
function no longer produces separate lines. In order to see a proper end of line, the final print()
function has a default value for end
. We could also have used an argument value of end='
'
to make the presence of the newline character explicit.
We can imagine that print()
has a definition something like this:
def print_like(*args, sep=None, end=None, file=sys.stdout):
if sep is None: sep = " "
if end is None: end = "
"
arg_iter = iter(args)
value = next(arg_iter)
file.write(str(value))
for value in arg_iter:
file.write(sep)
file.write(str(value))
file.write(end)
file.flush()
This only has a few of the features of the actual print()
function. The purpose is to illustrate how the separator and ending strings work. If no value is provided, the default value for the separator is a single space character, and the default value at end-of-line is a single newline character, "
"
.
This print-like function creates an explicit iterator object, arg_iter
. Using next(arg_iter)
allows us to treat the first item as special, since it won't have a separator in front of it. The for
statement then iterates through the remaining argument values, inserting the separator string, sep
, in front of each item after the first.
The end-of-line string, end
, is printed after all of the values. It is always written. We can effectively turn it off by setting it to a zero-length string, ""
.
Using the print()
function's sep
and end
parameters can get quite complex for anything more sophisticated than these simple examples. Rather than working with a complex sequence of print()
function requests, we can use the format()
method of a string, or use an f-string.
The sys
module defines the two standard output files that are always available: sys.stdout
and sys.stderr
. In the general case, the print()
function is a handy wrapper around stdout.write()
.
We can use the file=
keyword argument to write to the standard error file instead of writing to the standard output file:
>>> import sys
>>> print("Red Alert!", file=sys.stderr)
We've imported the sys
module so that we have access to the standard error file. We used this to write a message that would not be part of the standard output stream.
Because these two files are always available, using OS file redirection techniques often works out nicely. When our program's primary output is written to sys.stdout
, it can be redirected at the OS level. A user might enter a command line like this:
python3 myapp.py <input.dat >output.dat
This will provide the input.dat
file as the input to sys.stdin
. When a Python program writes to sys.stdout
, the output will be redirected by the OS to the output.dat
file.
In some cases, we need to open additional files. In that case, we might see programming like this:
>>> from pathlib import Path
>>> target_path = Path("data")/"extra_detail.log"
>>> with target_path.open('w', encoding='utf-8') as target_file:
... print("Some detailed output", file=target_file)
... print("Ordinary log")
Ordinary log
In this example, we've opened a specific path for the output and assigned the open file to target_file
using the with
statement. We can then use this as the file=
value in a print()
function to write to this file. Because a file is a context manager, leaving the with
statement means that the file will be closed properly; all of the OS resources will be released from the application. All file operations should be wrapped in a with
statement context to ensure that the resources are properly released.
In large, long-running applications like web servers, the failure to close files and release resources is termed a "leak." A memory leak, for example, can arise when files are not closed properly and buffers remain allocated. Using a with
statement assures that resources are released, eliminating a potential source of resource management problems.
Some Python scripts depend on gathering input from a user. There are several ways to do this. One popular technique is to use the console to prompt a user for input.
There are two relatively common situations:
input()
function for this. This will provide a helpful echo of the characters being entered.getpass()
function in the getpass
module for this.The input()
and getpass()
functions are just two implementation choices for reading from the console. It turns out that getting the string of characters is only the first step in gathering valid, useful data. The input also needs to be validated.
When gathering input from a user there are several tiers of considerations for us to make, including the following:
Gathering user input isn't a trivial operation. However, Python provides several libraries to help us implement the required tiers of input validation.
Above and beyond these techniques, we'll look at some other approaches in the Using argparse to get command-line input recipe later in this chapter.
We'll look at a technique for reading a complex structure from a person. In this case, we'll use year
, month
, and day
as separate items. These items are then combined to create a complete date
.
Here's a quick example of user input that omits all of the validation considerations. This is poor design:
from datetime import date
def get_date1() -> date:
year = int(input("year: "))
month = int(input("month [1-12]: "))
day = int(input("day [1-31]: "))
result = date(year, month, day)
return result
This illustrates how easy it is to use the input()
function. This will behave badly when the user enters an invalid date. Raising an exception for bad data isn't an ideal user experience. The recipe will take a different approach than this example.
We often need to wrap this in additional processing to make it more useful. The calendar is complex, and we'd hate to accept February 31 without warning a user that it is not a proper date.
input()
function isn't the best choice. If passwords or other secrets are involved, then use the getpass.getpass()
function. This means we need the following import when secrets are involved:
from getpass import getpass
Otherwise, when secret input is not required, we'll use the built-in input()
function, and no additional import is required.
input()
or getpass()
function. It can help to separate the input from the text-to-integer conversion. This recipe doesn't follow the snippet shown previously; it breaks the operation into two separate steps. First, get the text value:
year_text = input("year: ")
year = int(year_text)
while-try
block that looks like this:
year = None
while year is None:
year_text = input("year: ")
try:
year = int(year_text)
except ValueError as ex:
print(ex)
This applies a single validation rule, the int(year_txt)
expression, to ensure that the input is an integer. If the int()
function works without raising an exception, the resulting year
object is the desired integer. If the int()
function raises an exception, this is reported with an error message. The while
statement leads to a repeat of the input and conversion sequence of steps until the value of the year
variable is not None
.
Raising an exception for faulty input allows us the most flexibility. We can extend this with additional exception classes for other conditions the input must meet. In some cases, we may need to define our own unique customized exceptions for data validation.
In some cases, the error message can be printed to sys.stderr
instead of sys.stdout
. To do this, we could use print(ex, file=sys.stderr)
. Mixing standard output and standard error may not work out well because the OS-level buffering for these two files is sometimes different, leading to confusing output. It's often a good idea to stick to a single channel.
This processing only covers the year
field. We still need to get values for the month
and day
fields. This means we'll need three nearly identical loops for each of these three fields of a complex date object. Rather than copying and pasting nearly identical code, we need to restructure this input and validate the sequence into a separate function. We'll call the new function get_integer()
.
Here's the definition:
def get_integer(prompt: str) -> int:
while True:
value_text = input(prompt)
try:
value = int(value_text)
return value
except ValueError as ex:
print(ex)
This function will use the built-in input()
function to prompt the user for input. It uses the int()
function to try and create an integer value. If the conversion works, the value is returned. If the conversion raises a ValueError
exception, this is displayed to the user and the input is attempted again.
We can combine this into an overall process for getting the three integers of a date. This will involve the same while-try
, but applied to the composite object. It will look like this:
def get_date2() -> date:
while True:
year = get_integer("year: ")
month = get_integer("month [1-12]: ")
day = get_integer("day [1-31]: ")
try:
result = date(year, month, day)
return result
except ValueError as ex:
problem = f"invalid, {ex}"
This uses individual while-try
processing sequences in the get_integer()
function to get the individual values that make up a date. Then it uses the date()
constructor to create a date
object from the individual fields. If the date
object – as a whole – can't be built because the pieces are invalid, then the year
, month
, and day
must be re-entered to create a valid date
.
Given a year and a month, we can actually determine a slightly narrower range for the number of days. This is complex because months have different numbers of days, varying from 28 to 31, and February has a number of days that varies with the type of year.
We can compute the starting date of the next month and use a timedelta
object to provide the number of days between the two dates:
day_1_date = date(year, month, 1)
if month == 12:
next_year, next_month = year+1, 1
else:
next_year, next_month = year, month+1
day_end_date = date(next_year, next_month, 1)
stop = (day_end_date - day_1_date).days
day = get_integer(f"day [1-{stop}]: ")
This will compute the length of any given month for a given year. The algorithm works by computing the first day of a given year and month. It then computes the first day of the next month (which may be the first month of the next year).
The number of days between these dates is the number of days in the given month. The (day_end_date - day_1_date).days
expression extracts the number of days from the timedelta
object. This can be used to display a more helpful prompt for the number of days that are valid in a given month.
We need to decompose the input problem into several separate but closely related problems. We can imagine a tower of conversion steps. At the bottom layer is the initial interaction with the user. We identified two of the common ways to handle this:
input()
: This prompts and reads from a usergetpass.getpass()
: This prompts and reads passwords without an echoThese two functions provide the essential console interaction. There are other libraries that can provide more sophisticated interactions, if that's required. For example, the click
project has sophisticated prompting capabilities. See https://click.palletsprojects.com/en/7.x/.
On top of the foundation, we've built several tiers of validation processing. The tiers are as follows:
int()
or float()
. These raise ValueError
exceptions for invalid text.if
statement to determine whether values fit any additional constraints, such as ranges. For consistency, this should also raise a ValueError
exception if the data is invalid.datetime.date
. This also tends to raise ValueError
exceptions for dates that are invalid.There are a lot of potential kinds of constraints that might be imposed on values. For example, we might want only valid OS process IDs, called PIDs. This requires checking the /proc/<pid>
path on most Linux systems.
For BSD-based systems such as macOS X, the /proc
filesystem doesn't exist. Instead, something like the following needs to be done to determine if a PID is valid:
>>> import subprocess
>>> status = subprocess.run(["ps", str(PID)], check=True, text=True)
For Windows, the command would look like this:
>>> status = subprocess. run(
... ["tasklist", "/fi", f'"PID eq {PID}"'], check=True, text=True)
Either of these two functions would need to be part of the input validation to ensure that the user is entering a proper PID value. This check can only be made safely when the value of the PID
variable is a number.
We have several alternatives for user input that involve slightly different approaches. We'll look at these two topics in detail:
input()
with clever parsing of the source text.cmd
module: This involves a more complex class and somewhat simpler parsing.We'll start by looking at ways to process more complex text using more sophisticated parsing.
A simple date value requires three separate fields. A more complex date-time that includes a time zone offset from UTC involves seven separate fields: year, month, day, hour, minute, second, and time zone. Prompting for each individual field can be tedious for the person entering all those details. The user experience might be improved by reading and parsing a complex string rather than a large number of individual fields:
def get_date3() -> date:
while True:
raw_date_str = input("date [yyyy-mm-dd]: ")
try:
input_date = datetime.strptime(
raw_date_str, "%Y-%m-%d").date()
return input_date
except ValueError as ex:
print(f"invalid date, {ex}")
We've used the strptime()
function to parse a time string in a given format. We've emphasized the expected date format in the prompt that's provided in the input()
function. The datetime
module provides a ValueError
exception for data that's not in the right format as well as for non-dates that are in the right format; 2019-2-31
, for example, also raises a ValueError
exception.
This style of input requires the user to enter a more complex string. Since it's a single string that includes all of the details for a date, many people find it easier to use than a number of individual prompts.
Note that both techniques – gathering individual fields and processing a complex string – depend on the underlying input()
function.
The cmd
module includes the Cmd
class, which can be used to build an interactive interface. This takes a dramatically different approach to the notion of user interaction. It does not rely on using input()
explicitly.
We'll look at this closely in the Using cmd for creating command-line applications recipe.
In the reference material for the SunOS operating system, which is now owned by Oracle, there is a collection of commands that prompt for different kinds of user inputs:
https://docs.oracle.com/cd/E19683-01/816-0210/6m6nb7m5d/index.html
Specifically, all of these commands beginning with ck
are for gathering and validating user input. This could be used to define a module of input validation rules:
ckdate
: This prompts for and validates a dateckgid
: This prompts for and validates a group IDckint
: This displays a prompt, verifies, and returns an integer valueckitem
: This builds a menu, prompts for, and returns a menu itemckkeywd
: This prompts for and validates a keywordckpath
: This displays a prompt, verifies, and returns a pathnameckrange
: This prompts for and validates an integerckstr
: This displays a prompt, verifies, and returns a string answercktime
: This displays a prompt, verifies, and returns a time of dayckuid
: This prompts for and validates a user IDckyorn
: This prompts for and validates yes/noThis is a handy summary of the various kinds of user inputs used to support a command-line application. Another list of validation rules can be extracted from JSON schema definitions; this includes None
, Boolean
, integer
, float
, and string
. A number of common string formats include date-time, time, date, email, hostname, IP addresses in version 4 and version 6 format, and URIs. Another source of user input types can be found in the definition of the HTML5 <input>
tag; this includes color
, date
, datetime-local
, email
, file
, month
, number
, password
, telephone
numbers
, time
, URL
, and
week-year
.
One of the most important debugging and design tools available in Python is the print()
function. There are some kinds of formatting options available; we looked at these in the Using features of the print() function recipe.
What if we want more flexible output? We have more flexibility with f"string"
formatting.
Let's look at a multistep process that involves some moderately complex calculations. We'll compute the mean and standard deviation of some sample data. Given these values, we'll locate all items that are more than one standard deviation above the mean:
>>> import statistics
>>> size = [2353, 2889, 2195, 3094,
... 725, 1099, 690, 1207, 926,
... 758, 615, 521, 1320]
>>> mean_size = statistics.mean(size)
>>> std_size = statistics.stdev(size)
>>> sig1 = round(mean_size + std_size, 1)
>>> [x for x in size if x > sig1]
[2353, 2889, 3094]
This calculation has several working variables. The final list comprehension involves three other variables, mean_size
, std_size
, and sig1
. With so many values used to filter the size
list, it's difficult to visualize what's going on. It's often helpful to know the steps in the calculation; showing the values of the intermediate variables can be very helpful.
The f"{name=}"
string will have both the literal name=
and the value for the name
variable. Using this with a print()
function looks as follows:
>>> print(
... f"{mean_size=:.2f}, {std_size=:.2f}"
... )
mean_size=1414.77, std_size=901.10
We can use {name=}
to put any variable into the f-string and see the value. These examples in the code above include :.2f
as the format specification to show the values rounded to two decimal places. Another common suffix is !r
; to show the internal representation of the object, we might use f"{name=!r}"
.
For more background on the formatting options, refer to the Building complex strings with f"strings" recipe in Chapter 1, Numbers, Strings, and Tuples. Python 3.8 extends the basic f-string formatting to introduce the "="
formatting option to display a variable and the value of the variable.
There is a very handy extension to this capability. We can actually use any expression on the left of the "="
option in the f-string. This will show the expression and the value computed by the expression, providing us with even more debugging information.
For example, we can use this more flexible format to include additional calculations that aren't simply local variables:
>>> print(
... f"{mean_size=:.2f}, {std_size=:.2f},"
... f" {mean_size+2*std_size=:.2f}"
... )
mean_size=1414.77, std_size=901.10, mean_size+2*std_size=3216.97
We've computed a new value, mean_size+2*std_size
, that appears only inside the formatted output. This lets us display intermediate computed results without having to create an extra variable.
format()
method.For some applications, it can be better to get the user input from the OS command line without a lot of human interaction. We'd prefer to parse the command-line argument values and either perform the processing or report an error.
For example, at the OS level, we might want to run a program like this:
% python3 ch05_r04.py -r KM 36.12,-86.67 33.94,-118.40
From (36.12, -86.67) to (33.94, -118.4) in KM = 2887.35
The OS prompt is %
. We entered a command of python3 ch05_r04.py
. This command had an optional argument, -r KM
, and two positional arguments of 36.12,-86.67
and 33.94,-118.40
.
The program parses the command-line arguments and writes the result back to the console. This allows a very simple kind of user interaction. It keeps the program very simple. It allows the user to write a shell script to invoke the program or merge the program with other Python programs to create a higher-level program.
If the user enters something incorrect, the interaction might look like this:
% python3 ch05_r04.py -r KM 36.12,-86.67 33.94,-118asd
usage: ch05_r04.py [-h] [-r {NM,MI,KM}] p1 p2
ch05_r04.py: error: argument p2: could not convert string to float: '-118asd'
An invalid argument value of -118asd
leads to an error message. The program stopped with an error status code. For the most part, the user can hit the up-arrow key to get the previous command line back, make a change, and run the program again. The interaction is delegated to the OS command line.
The name of the program – ch05_r04
– isn't too informative. We could perhaps have chosen a more informative name. The positional arguments are two (latitude, longitude) pairs. The output shows the distance between the two in the given units.
How do we parse argument values from the command line?
The first thing we need to do is to refactor our code to create three separate functions:
argparse
module, this function will almost always return an argparse.Namespace
object.main
function that gathers options and invokes the real work function with the appropriate argument values.Here's our real work function, display()
:
from ch03_r05 import haversine, MI, NM, KM
def display(lat1: float, lon1: float, lat2: float, lon2: float, r: str) -> None:
r_float = {"NM": NM, "KM": KM, "MI": MI}[r]
d = haversine(lat1, lon1, lat2, lon2, r_float)
print(f"From {lat1},{lon1} to {lat2},{lon2} in {r} = {d:.2f}")
We've imported the core calculation, haversine()
, from another module. We've provided argument values to this function and used an f-string to display the final result message.
We've based this on the calculations shown in the examples in the Picking an order for parameters based on partial functions recipe in Chapter 3, Function Definitions:
The essential calculation yields the central angle, c, between two points. The angle is measured in radians. We convert it into distance by multiplying by the Earth's mean radius in whatever unit we like. If we multiply angle c by a radius of 3,959 miles, the distance, we'll convert the angle to miles.
Note that we expect the distance conversion factor, r, to be provided as a string. This function will then map the string to an actual floating-point value, r_float
. The "MI"
string, for example, maps to the conversion value from radians to miles, MI
, equal to 3,959.
Here's how the function looks when it's used inside Python:
>>> from ch05_r04 import display
>>> display(36.12, -86.67, 33.94, -118.4, 'NM')
From 36.12,-86.67 to 33.94,-118.4 in NM = 1558.53
This function has two important design features. The first feature is it avoids references to features of the argparse.Namespace
object that's created by argument parsing. Our goal is to have a function that we can reuse in a number of alternative contexts. We need to keep the input and output elements of the user interface separate.
The second design feature is this function displays a value computed by another function. This is a helpful decomposition of the problem. We've separated the user experience of printed output from the essential calculation. This fits the general design pattern of separating processing into tiers and isolating the presentation tier from the application tier.
def get_options(argv: List[str]) -> argparse.Namespace:
parser
object:
parser = argparse.ArgumentParser()
parser
object. Sometimes this is difficult because we're still refining the user experience. It's difficult to imagine all the ways in which people will use a program and all of the questions they might have. For our example, we have two mandatory, positional arguments, and an optional argument:parser.add_argument(
"-u", "--units",
action="store", choices=("NM", "MI", "KM"), default="NM")
parser.add_argument(
"p1", action="store", type=point_type)
parser.add_argument(
"p2", action="store", type=point_type)
options = parser.parse_args(argv)
We've added optional and mandatory arguments. The first is the -u
argument, which starts with a single dash, -
, to mark it as optional. Additionally, a longer double dash version was added, --units
, in this case. These are equivalent, and either can be used on the command line.
The action of 'store'
will store any value that follows the -r
option in the command line. We've listed the three possible choices and provided a default. The parser will validate the input and write appropriate errors if the input isn't one of these three values.
The mandatory arguments are named without a -
prefix. These also use an action of 'store'
; since this is the default action it doesn't really need to be stated. The function provided as the type
argument is used to convert the source string to an appropriate Python object. We'll look at the point_type()
validation function in this section.
parse_args()
method of the parser object created in step 2:
options = parser.parse_args(argv)
By default, the parser uses the values from sys.argv
, which are the command-line argument values entered by the user. Testing is much easier when we can provide an explicit argument value.
Here's the final function:
def get_options(argv: List[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser()
parser.add_argument("-r", action="store",
choices=("NM", "MI", "KM"), default="NM")
parser.add_argument("p1", action="store", type=point_type)
parser.add_argument("p2", action="store", type=point_type)
options = parser.parse_args(argv)
return options
This relies on the point_type()
function to both validate the string and convert it to an object of a more useful type. We might use type = int
or type = float
to convert to a number.
In our example, we used point_type()
to convert a string to a (latitude, longitude) two-tuple. Here's the definition of this function:
def point_type(text: str) -> Tuple[float, float]:
try:
lat_str, lon_str = text.split(",")
lat = float(lat_str)
lon = float(lon_str)
return lat, lon
except ValueError as ex:
raise argparse.ArgumentTypeError(ex)
The point_type()
function parses the input values. First, it separates the two values at the ,
character. It attempts a floating-point conversion on each part. If the float()
functions both work, we have a valid latitude and longitude that we can return as a pair of floating-point values.
If anything goes wrong, an exception will be raised. From this exception, we'll raise an ArgumentTypeError
exception. This is caught by the argparse
module and causes it to report the error to the user.
Here's the main script that combines the option parser and the output display functions:
def main(argv: List[str] = sys.argv[1:]) -> None:
options = get_options(argv)
lat_1, lon_1 = options.p1
lat_2, lon_2 = options.p2
display(lat_1, lon_1, lat_2, lon_2, options.r)
if __name__ == "__main__":
main()
This main script connects the user inputs to the displayed output. It does this by parsing the command-line options. Given the values provided by the user, these are decomposed into values required by the display()
function, isolating the processing from the input parsing. Let's take a closer look at how argument parsing works.
The argument parser works in three stages:
ArgumentParser
. We can provide information such as the overall program description. We can also provide a formatter and other options here.add_argument()
method. These can include optional arguments as well as required arguments. Each argument can have a number of features to provide different kinds of syntax. We'll look at a number of the alternatives in the There's more... section.parse()
method will use sys.argv
automatically. We can provide an explicit value instead of the sys.argv
values. The most common reason for providing an override value is to allow more complete unit testing.Some simple programs will have a few optional arguments. A more complex program may have many optional arguments.
It's common to have a filename as a positional argument. When a program reads one or more files, the filenames are provided in the command line, as follows:
python3 some_program.py *.rst
We've used the Linux shell's globbing feature: the *.rst
string is expanded into a list of all files that match the naming rule. This is a feature of the Linux shell, and it happens before the Python interpreter starts. This list of files can be processed using an argument defined as follows:
parser.add_argument('file', nargs='*')
All of the names on the command line that do not start with the -
character will be collected into the file
value in the object built by the parser.
We can then use the following:
for filename in options.file:
process(filename)
This will process each file given in the command line.
For Windows programs, the shell doesn't glob filenames from wildcard patterns, and the application must deal with filenames that contain wildcard characters like "*"
and "?"
in them. The Python glob
module can help with this. Also, the pathlib
module can create Path
objects, which include globbing features.
To support Windows, we might have something like this inside the get_options()
function. This will expand file strings into all matching names:
if platform.system() == "Windows":
options.file = list(
name
for wildcard in options.file
for name in Path().glob(wildcard)
)
This will expand all of the names in the file
parameter to create a new list similar to the list created by the Linux and macOS platforms.
It can be difficult to refer to a file with an asterisk or question mark in the name. For example, a file named something*.py
appears to be a pattern for globbing, not a single filename. We can enclose the pattern wildcard character in []
to create a name that matches literally: something[*].py
will only match the file named something*.py
.
Some applications have very complex argument parsing options. Very complex applications may have dozens of individual commands. As an example, look at the git
version control program; this application uses dozens of separate commands, such as git clone
, git commit
, and git push
. Each of these commands has unique argument parsing requirements. We can use argparse
to create a complex hierarchy of these commands and their distinct sets of arguments.
What kinds of arguments can we process? There are a lot of argument styles in common use. All of these variations are defined using the add_argument()
method of a parser:
-o
or –option
arguments are often used to enable or disable features of a program. These are often implemented with add_argument()
parameters of action='store_true'
, default=False
. Sometimes the implementation is simpler if the application uses action='store_false'
, default=True
. The choice of default value and stored value may simplify the programming, but it won't change the user's experience.-o
or --option
arguments. We may want to implement this using a more complex object that's not a simple Boolean constant. We can use action='store_const'
, const=some_object
, and default=another_object
. As modules, classes, and functions are also objects, a great deal of sophistication is available here.-r unit
as an argument that accepted the string name for the units to use. We implemented this with an action='store'
assignment to store the supplied string value. We can also use the type=function
option to provide a function that validates or converts the input into a useful form.action='count'
, default=0
to count the number of times a given argument is present. The user can provide -v
for verbose output and -vv
for very verbose output. The argument parser treats -vv
as two instances of the -v
argument, which means that the value will increase from the initial value of 0
to 2
.action='append'
, default=[]
. This would allow the user to use -r NM -r KM
to get a display in both nautical miles and kilometers. This would require a significant change to the display()
function, of course, to handle multiple units in a collection.-h
and --help
will display a help message and exit. This will provide the user with useful information. We can disable this or change the argument string, if we need to. This is a widely used convention, so it seems best to do nothing so that it's a feature of our program.–Version
as an argument to display the version number and exit. We implement this with add_argument("--Version", action="version", version="v 3.14")
. We provide an action of version
and an additional keyword argument that sets the version to display.This covers most of the common cases for command-line argument processing. Generally, we'll try to leverage these common styles of arguments when we write our own applications. If we strive to use simple, widely used argument styles, our users are somewhat more likely to understand how our application works.
There are a few Linux commands that have even more complex command-line syntax. Some Linux programs, such as find
or expr
, have arguments that can't easily be processed by argparse
. For these edge cases, we would need to write our own parser using the values of sys.argv
directly.
There are several ways of creating interactive applications. The Using input() and getpass() for user input recipe looked at functions such as input()
and getpass.getpass()
. The Using argparse to get command-line input recipe showed us how to use argparse
to create applications with which a user can interact from the OS command line.
We have a third way to create interactive applications: using the cmd
module. This module will prompt the user for input, and then invoke a specific method of the class we provide.
Here's how the interaction will look – we've marked user input like this: "help
":
A dice rolling tool. ? for help.
] help
Documented commands (type help <topic>):
========================================
dice help reroll roll
Undocumented commands:
======================
EOF quit
] help roll
Roll the dice. Use the dice command to set the number of dice.
] help dice
Sets the number of dice to roll.
] dice 5
Rolling 5 dice
] roll
[6, 6, 4, 3, 3]
]
There's an introductory message from the application with a very short explanation. The application displays a prompt, ]
. The user can then enter any of the available commands.
When we enter help
as a command, we see a display of the commands. Four of the commands have further details. The other two, EOF
and quit
, have no further details available.
When we enter help roll
, we see a brief summary for the roll
command. Similarly, entering help dice
displays information about the dice
command. We entered the dice 5
command to set the number of dice, and then the roll
command showed the results of rolling five dice. This shows the essence of how an interactive command-line application prompts for input, reads commands, evaluates, and prints a result.
The core feature of the cmd
.Cmd
application is a read-evaluate-print loop (REPL). This kind of application works well when there are a large number of individual state changes and a large number of commands to make those state changes.
We'll make use of a simple, stateful dice game. The idea is to have a handful of dice, some of which can be rolled and some of which are frozen. This means our Cmd
class definition will have some attributes that describe the current state of the handful of dice.
We'll define a small domain of commands, to roll and re-roll a handful of dice. The interaction will look like the following:
] roll
[4, 4, 1, 6, 4, 6]
] reroll 2 3 5
[4, 4, 6, 5, 4, 5] (reroll 1)
] reroll 2 3 5
[4, 4, 1, 3, 4, 3] (reroll 2)
In this example, the roll
command rolled six dice. The two reroll
commands created a hand for a particular game by preserving the dice from positions 0, 1, and 4, and rerolling the dice in positions 2, 3, and 5.
How can we create stateful, interactive applications with an REPL?
cmd
module to make the cmd.Cmd
class definition available:
import cmd
cmd.Cmd
:
class DiceCLI(cmd.Cmd):
preloop()
method:
def preloop(self):
self.n_dice = 6
self.dice = None # no initial roll.
self.reroll_count = 0
This preloop()
method is evaluated just once when the processing starts. The self
argument is a requirement for methods within a class. For now, it's a simply required syntax. In Chapter 7, Basics of Classes and Objects, we'll look at this more closely.
Initialization can also be done in an __init__()
method. Doing this is a bit more complex, though, because it must collaborate with the Cmd
class initialization. It's easier to do initialization separately in the preloop()
method.
do_command()
method. The name of the method will be the command, prefixed by do_
. The user's input text after the command will be provided as an argument value to the method. The docstring comment in the method definition is the help text for the command. Here are two examples for the roll
command and the reroll
command:
def do_roll(self, arg: str) -> bool:
"""Roll the dice. Use the dice command to set the number of dice."""
self.dice = [random.randint(1, 6) for _ in range(self.n_dice)]
print(f"{self.dice}")
return False
def do_reroll(self, arg: str) -> bool: """Reroll selected dice. Provide the 0-based positions."""
try:
positions = map(int, arg.split())
except ValueError as ex:
print(ex)
return False
for p in positions:
self.dice[p] = random.randint(1, 6)
self.reroll_count += 1
print(f"{self.dice} (reroll {self.reroll_count})")
return False
def do_dice(self, arg: str) -> bool:
"""Sets the number of dice to roll."""
try:
self.n_dice = int(arg)
except ValueError:
print(f"{arg!r} is invalid")
return False
self.dice = None
print(f"Rolling {self.n_dice} dice")
return False
cmdloop()
method:
if __name__ == "__main__":
game = DiceCLI()
game.cmdloop()
We've created an instance of our DiceCLI
subclass of Cmd
. When we execute the cmdloop()
method, the class will write any introductory messages that have been provided, write the prompt, and read a command.
The Cmd
module contains a large number of built-in features for displaying a prompt, reading input from a user, and then locating the proper method based on the user's input.
For example, when we enter dice 5
, the built-in methods of the Cmd
superclass will strip the first word from the input, dice
, prefix this with do_
, and then evaluate the method that implements the command. The argument value will be the string "5"
.
If we enter a command for which there's no matching do_
method, the command processor writes an error message. This is done automatically; we don't need to write any code to handle invalid commands.
Some methods, such as do_help()
, are already part of the application. These methods will summarize the other do_*
methods. When one of our methods has a docstring, this can be displayed by the built-in help feature.
The Cmd
class relies on Python's facilities for introspection. An instance of the class can examine the method names to locate all of the methods that start with do_
. They're available in a class-level __dict__
attribute. Introspection is an advanced topic, one that will be touched on in Chapter 8, More Advanced Class Design.
The Cmd
class has a number of additional places where we can add interactive features:
help_*()
methods that become part of the help topics.do_*
methods return a non-False
value, the loop will end. We might want to add a do_quit()
method that has return True
as its body. This will end the command-processing loop.emptyline()
to respond to blank lines. One choice is to do nothing quietly. Another common choice is to have a default action that's taken when the user doesn't enter a command.default()
method is evaluated when the user's input does not match any of the do_*
methods. This might be used for more advanced parsing of the input.postloop()
method can be used to do some processing just after the loop finishes. This would be a good place to write a summary. This also requires a do_*
method that returns a value – any non-False
value – to end the command loop.Also, there are a number of attributes we can set. These are class-level variables that would be peers of the method definitions:
prompt
attribute is the prompt string to write. For our example, we can do the following:
class DiceCLI(cmd.Cmd):
prompt="] "
intro
attribute is the introductory message.doc_header
, undoc_header
, misc_header
, and ruler
attributes. These will all alter how the help output looks.The goal is to be able to create a tidy class that handles user interaction in a way that's simple and flexible. This class creates an application that has a lot of features in common with Python's REPL. It also has features in common with many command-line programs that prompt for user input.
One example of these interactive applications is the command-line FTP client in Linux. It has a prompt of <ftp>
, and it parses dozens of individual FTP commands. Entering help
will show all of the various internal commands that are part of FTP interaction.
There are several ways to look at inputs provided by the users of our software:
~/.zshrc
file and the ~/.profile
file. There can also be system-wide files, like /etc/zshrc
. This makes the values persistent and less interactive than the command line. Other shells offer other filenames for settings and configurations unique to the shell.In the Using input() and getpass() for user input and Using cmd for creating command-line applications recipes, we looked at interaction with the user. In the Using argparse to get command-line input recipe, we looked at how to handle command-line arguments. We'll look at configuration files in Chapter 13, Application Integration: Configuration.
The environment variables are available through the os
module. How can we get an application's configuration based on these OS-level settings?
We may want to provide information of various types to a program via OS environment variable settings. There's a profound limitation here: the OS settings can only be string values. This means that many kinds of settings will require some code to parse the value and create proper Python objects from the string.
When we work with argparse
to parse command-line arguments, this module can do some data conversions for us. When we use os
to process environment variables; we'll have to implement the conversion ourselves.
In the Using argparse to get command-line input recipe, we wrapped the haversine()
function in a simple application that parsed command-line arguments.
At the OS level, we created a program that worked like this:
% python3 ch05_r04.py -r KM 36.12,-86.67 33.94,-118.40
From (36.12, -86.67) to (33.94, -118.4) in KM = 2887.35
After using this version of the application for a while, we found that we're often using nautical miles to compute distances from where our boat is anchored. We'd really like to have default values for one of the input points as well as the -r
argument.
Since a boat can be anchored in a variety of places, we need to change the default without having to tweak the actual code.
We'll set an OS environment variable, UNITS
, with the distance units. We can set another variable, HOME_PORT
, with the home point. We want to be able to do the following:
% UNITS=NM
% HOME_PORT=36.842952,-76.300171
% python3 ch05_r06.py 36.12,-86.67
From 36.12,-86.67 to 36.842952,-76.300171 in NM = 502.23
The units and the home point values are provided to the application via the OS environment. This can be set in a configuration file so that we can make easy changes. It can also be set manually, as shown in the example.
os
module. The OS environment is available through this module:
import os
from Chapter_03.ch03_r08 import haversine, MI, NM, KM
from Chapter_05.ch05_r04 import point_type, display
sys.argv
, so it's important to also import the sys
module:
def get_options(argv: List[str] = sys.argv[1:]) -> argparse.Namespace:
default_units = os.environ.get("UNITS", "KM")
if default_units not in ("KM", "NM", "MI"):
sys.exit(f"Invalid UNITS, {default_units!r} not KM, NM, or MI")
default_home_port = os.environ.get("HOME_PORT")
The sys.exit()
function handles the error processing nicely. It will print the message and exit with a non-zero status code.
parser
attribute. Provide any default values for the relevant arguments. This depends on the argparse
module, which must also be imported:
parser = argparse.ArgumentParser()
parser.add_argument(
"-u", "--units",
action="store", choices=("NM", "MI", "KM"), default=default_units
)
parser.add_argument("p1", action="store", type=point_type)
parser.add_argument(
"p2", nargs="?", action="store", type=point_type, default=default_home_port
)
options = parser.parse_args(argv)
HOME_PORT
and no value provided for the second command-line argument. This requires an if
statement and a call to sys.exit()
:
if options.p2 is None:
sys.exit("Neither HOME_PORT nor p2 argument provided.")
options
object with the set of valid arguments:
return options
This will allow the -r
argument and the second point to be completely optional. The argument parser will use the configuration information to supply default values if these are omitted from the command line.
Use the Using argparse to get command-line input recipe for ways to process the options created by the get_options()
function.
We've used the OS environment variables to create default values that can be overridden by command-line arguments. If the environment variable is set, that string is provided as the default to the argument definition. If the environment variable is not set, then an application-level default value is used.
In the case of the UNITS
variable, in this example, the application uses kilometers as the default if the OS environment variable is not set.
This gives us three tiers of interaction:
.bashrc
file; for zsh, it is the .zshrc
file. For Windows, we can use the Windows Advanced Settings option to make a change that is persistent. This value will be used each time we log in or create a new command window.Note that there's no built-in or automatic validation of the values retrieved from environment variables. We'll need to validate these strings to ensure that they're meaningful.
Also note that we've repeated the list of valid units in several places. This violates the Don't Repeat Yourself (DRY) principle. A global variable with a valid collection of values is a good improvement to make. (Python lacks formal constants, which are variables that cannot be changed. It's common to treat globals as if they are constants that should not be changed.)
The Using argparse to get command-line input recipe shows a slightly different way to handle the default command-line arguments available from sys.argv
. The first of the arguments is the name of the Python application being executed and is not often relevant to argument parsing.
he value of sys.argv
will be a list of strings:
['ch05_r06.py', '-r', 'NM', '36.12,-86.67']
We have to skip the initial value in sys.argv[0]
at some point in the processing. We have two choices:
sys.argv[1:]
to the parser.main()
function used options = get_options(sys.argv[1:])
to provide the shorter list to the parser.Generally, the only relevant distinction between the two approaches is the number and complexity of the unit tests. This recipe will require a unit test that includes an initial argument string, which will be discarded during parsing.
3.94.202.151