To be useful, a program needs to communicate with the world around it. It needs to interact with the user, or read and write files, or access Web pages, and so on. In general, we refer to this as input and output, or I/O for short.
We’ve already seen basic console I/O, which involves printing messages and using the input
function to read strings from the user. Now we’ll see some string formatting that lets you make fancy output strings for console I/O and anywhere you need a formatted string.
Then we’ll turn to file I/O, which is all about reading and writing files. Python provides a lot of support for basic file I/O, making it as easy as possible for programmers. In particular, we’ll see how to use text files, binary files, and the powerful pickle
module.
Python provides a number of different ways to create formatted strings. We will discuss the older string interpolation and the newer format strings.
String interpolation is a simple approach to string formatting that Python borrows from the C programming language. For instance, here’s how you can control the number of decimal places in a float:
>>> x = 1/81 >>> print(x) 0.0123456790123 >>> print('value: %.2f' % x) value: 0.01 >>> print('value: %.5f' % x) value: 0.01235
String interpolation expressions always have the form format % values
, where format
is a string containing one or more occurrences of the %
character. In the example 'x = %.2f' % x
, the substring %.2f
is a formatting command that tells Python to take the first supplied value (x
) and to display it as a floating point value with two decimal places.
The character f
in the format string is a conversion specifier, and it tells Python how to render the corresponding value. Table 8.1 lists the most commonly used conversion specifiers.
The e
, E
, and f
specifiers give you different ways of representing floats. For example:
>>> x 0.012345679012345678 >>> print('x = %f' % x) x = 0.012346 >>> print('x = %e' % x) x = 1.234568e-02 >>> print('x = %E' % x) x = 1.234568E-02
You can put as many specifiers as you need in a format string, although you must supply exactly one value for each specifier. For example:
>>> a, b, c = 'cat', 3.14, 6 >>> s = 'There's %d %ss older than %.2f years' % (c, a, b) >>> s "There's 6 cats older than 3.14 years"
As this example shows, the format string acts as a simple template that gets filled in by the values. The values are given in a tuple, and they must be in the order in which you want them replaced.
A second way to create fancy strings in Python is to use format strings with the string function format(value, format_spec)
. For example:
>>> 'My {pet} has {prob}'.format(pet = 'dog', prob='fleas') 'My dog has fleas'
In a format string, anything within curly braces is replaced. This is called named replacement, and it is especially readable in this example.
You can also replace values by position:
>>> 'My {0} has {1}'.format('dog', 'fleas') 'My dog has fleas'
Or apply formatting codes similar to interpolated strings:
>>> '1/81 = {x}'.format(x=1/81) '1/81 = 0.0123456790123' >>> '1/81 = {x:f}'.format(x=1/81) '1/81 = 0.012346' >>> '1/81 = {x:.3f}'.format(x=1/81) '1/81 = 0.012'
You can specify formatting parameters within braces, like this:
>>> 'num = {x:.{d}f}'.format(x=1/81, d=3) 'num = 0.012' >>> 'num = {x:.{d}f}'.format(x=1/81, d=4) 'num = 0.0123'
This is something you can’t do with regular string interpolation.
If you need the {
or }
characters to appear as themselves in a format string, type them as {{
and }}
.
Format strings are more flexible and powerful than string interpolation, but also more complicated. If you are creating only a few simple formatted strings, string interpolation is probably the best choice. Otherwise, format strings are more useful for larger and more complex formatting jobs, such as creating Web pages or form letters for e-mail.
A file is a named collection of bits stored on a secondary storage device, such as a hard disk, USB drive, flash memory stick, and so on. We distinguish between two categories of files: text files, which are essentially strings stored on disk, and binary files, which are everything else.
Text files have the following characteristics:
They are essentially “strings on disk.” Python source code files and HTML files are examples of text files.
They can be edited with any text editor. Thus, they are relatively easy for humans to read and modify.
They tend to be difficult for programs to read. Typically, relatively complex programs called parsers are needed to read each different kind of text file. For instance, Python uses a special-purpose parser to help read .py
files, while HTML parsers are used to read HTML files.
They are usually larger than equivalent binary files. This can be a major problem when, for instance, you need to send a large text file over the Internet. Thus, text files are often compressed (for example, into zip format) to speed up transmission and to save disk space.
Binary files have the following characteristics:
They are not usually human-readable, at least within a regular text editor. A binary file is displayed in a text editor as a random-looking series of characters. Some kinds of binary files, such as JPEG image files, have special viewers for displaying their content.
They usually take up less space than equivalent text files. For instance, a binary file might group the information within it in chunks of 32 bits without using commas, spaces, or any kind of separator character.
They are often easier to read and write than text files. While each binary file is different, it’s often not necessary to write complex parsers to read them.
They are often tied to a specific program and are often unusable if you lack that program. Some popular binary files may have their file formats published so that you can, if so motivated, write your programs to read and write them. However, this usually requires substantial effort.
In addition to files, folders (or directories) are used to store files and other folders. The folder structure of most file systems is quite large and complex, forming a hierarchical folder structure.
A pathname is the name used to identify a file or a folder. The full pathname can be quite long. For example, the Python folder on my Windows computer has this full pathname: C:Documents and Settings jdDesktoppython.
Windows pathnames use a backward slash () character to separate names in a path, and they start with the letter of the disk drive (in this example, C:).
On Mac and Linux systems, a forward slash (/) is used to separate names. Plus, there is no drive letter at the start. For example, here is the full pathname for my Python folder on Linux: /home/tjd/Desktop/python.
Recall that if you want to write a character in a Python string, it must be doubled:
'C:\home\tjd\Desktop\python'
To avoid the double backslashes, you can use a raw string:
r'C:home jdDesktoppython'
Getting Python programs to work with both styles of pathnames is a bit tricky, and you should read the documentation for Python’s os.path
module for (much!) more information.
Many programs use the idea of a current working directory, or cwd. This is simply one directory that has been designated as the default directory: Whenever you do something to a file or a folder without providing a full pathname, Python assumes you mean a file or a folder in the current working directory.
Python provides many functions that return information about your computer’s files and folders (its file system). Table 8.2 lists a few of the most useful ones.
Table 8.2. Useful File and Folder Functions
NAME | ACTION |
---|---|
| Returns the name of the current working directory |
| Returns a list of strings of the names of all the files and folders in the folder specified by path |
| Sets the current working directory to be path |
| Returns |
| Returns |
| Returns information about |
Let’s write a couple of useful functions to see how these work. For instance, a common task is retrieving the files and folders in the current working directory. Writing os.listdir(os.getcwd())
is unwieldy, so we can write this function:
# list.py def list_cwd(): return os.listdir(os.getcwd())
The following two related helper functions use list comprehensions to return just the files and folders in the current working directory:
# list.py def files_cwd(): return [p for p in list_cwd() if os.path.isfile(p)] def folders_cwd(): return [p for p in list_cwd() if os.path.isdir(p)]
If you just want a list of, say, the .py
files in the current working directory, then this will work:
# list.py def list_py(path = None): if path == None: path = os.getcwd() return [fname for fname in os.listdir(path) if os.path.isfile(fname) if fname.endswith('.py')]
This function plays a useful trick with its input parameter: If you call list_py()
without a parameter, it runs on the current working directory. Otherwise, it runs on the directory you pass in.
Finally, here’s a function that returns the sum of the sizes of the files in the current working directory:
# list.py def size_in_bytes(fname): return os.stat(fname).st_size def cwd_size_in_bytes(): total = 0 for name in files_cwd(): total = total + size_in_bytes(name) return total
To save space, we’ve removed the doc strings for these functions. However, the supplementary code files on Google’s “pythonintro” Web site (http://pythonintro.googlecode.com) all include doc strings.
You can tell from the name cwd_size_in_bytes
that the return value will be in bytes. Putting the unit of the return value in the function name prevents the need to check the documentation for the units.
In general, it’s a good idea to use lots of functions. Even single-line functions such as list_dir()
are useful because they make your programs easier to read and maintain.
The os.stat()
function is fairly complex and provides much more information about files than we’ve shown here. Check Python’s online documentation for more information (http://docs.python.org/dev/3.0/library/os.html).
Python makes it relatively easy to process text files. In general, file processing follows the three steps shown in Figure 8.1.
Perhaps the most common way of reading a text file is to read it one line at a time. For example, this prints the contents of a file to the screen:
# printfile.py def print_file1(fname): f = open(fname, 'r') for line in f: print(line, end = '') f.close() # optional
The first line of the function opens the file: open
requires the name of the file you want to process, and also the mode you want it opened in. We are only reading the file, so we open the file in read mode 'r'
. Table 8.3 lists Python’s main file modes.
The open
function returns a special file object, which represents the file on disk. Importantly, open
does not read the file into RAM. Instead, in this example, the file is read a line at a time using a for-loop.
The last line of print_file1
closes the file. As the comment notes, this is optional: Python almost always automatically closes files for you. In this case, variable f
is local to print_file1
, so when print_file1
ends, Python automatically closes and then deletes the file object (not the file itself, of course!) that f
points to.
The print
statement in print_file1
sets end = ''
because the lines of a file always end with a
character. Thus if we had written just print(line)
, the file would be displayed with extra blank lines (try it and see!).
If errors occur while a file is open, it is possible that the program could end without the file being properly closed. In the next chapter, we will see how to handle such errors and ensure that a file is always correctly closed.
Another common way of reading a text file is to read it as one big string. For example:
# printfile.py def print_file2(fname): f = open(fname, 'r') print(f.read()) f.close()
This is shorter and simpler than print_file1
, so many programmers prefer it. However, if the file you are reading is very large, it will take up a lot of RAM, which could slow down, or even crash, your computer.
Finally, we note that many programmers would write this function with a single-line body:
# printfile.py def print_file3(fname): print(open(fname, 'r').read())
While this more compact form might take some getting used to, many programmers like this style because it is both quick to type and still relatively readable.
Writing text files is only a little more involved than reading them. For example, this function creates a new text file named story.txt
:
# write.py def make_story1(): f = open('story.txt', 'w') f.write('Mary had a little lamb, ') f.write('and then she had some more. ')
The 'w'
tells Python to open the file in write mode. To put text into the file, you call f.write
with the string you want to put into the file. Strings are written to the file in the order in which they are given.
Important: If story.txt
already exists, then calling open('story.txt', 'w')
will delete it! If you want to avoid overwriting story.txt
, you need to check to see if it exists:
# write.py import os def make_story2(): if os.path.isfile('story.txt'): print('story.txt already exists) else: f = open('story.txt', 'w') f.write('Mary had a little lamb, ') f.write('and then she had some more. ')
One common way of adding strings to a text file is to append them to the end of the file. Unlike 'w'
mode, this does not delete anything that might already be in the file. For example:
def add_to_story(line,
fname = 'story.txt'):
f = open(fname, 'a')
f.write(line)
The important thing to note here is that the file is opened in append mode 'a'
.
Writing a string to the beginning of a file is not as easy as appending one to the end because the Windows, Linux, and Macintosh operating systems don’t directly support inserting text at the beginning of a text file. Perhaps the simplest way to insert text at the beginning of a file is to read the file into a string, insert the new text into the string, and then write the string back to the original file. For example:
def insert_title(title, fname = 'story.txt'): f = open(fname, 'r+') temp = f.read() temp = title + ' ' + temp f.seek(0) # reset file pointer # to beginning f.write(temp)
First, notice that we open the file using the special mode 'r+'
, which means the file can be both read from and written to. Then we read the entire file into the string temp
and insert the title using string concatenation.
Before writing the newly created string back into the file, we first have to tell the file object f
to reset its internal file pointer. All text file objects keep track of where they are in the file, and after f.read()
is called, the file pointer is at the very end. Calling f.seek(0)
puts it back at the start of the file, so that when we write to f
, it begins at the start of the file.
If a file is not a text file, then it is considered to be a binary file. Binary files are opened in 'b'
mode, and you access the individual bytes of the file. For example:
def is_gif(fname): f = open(fname, 'br') first4 = tuple(f.read(4)) return first4 == (0x47, 0x49, 0x46, 0x38)
This function tests if fname
is a GIF image file by checking to see if its first 4 bytes are (0x47, 0x49, 0x46, 0x38)
(all GIFs must start with those 4 bytes).
In Python, numbers like 0x47
are base-16 hexadecimal numbers, or hex for short. They are very convenient for dealing with bytes, since each hexadecimal digit corresponds to a pattern of 4 bits, and so 1 byte can be described using two hex digits (such as 0x47
).
Notice that the file is opened in 'br'
mode, which means binary reading mode. When reading a binary file, you call f.read(n)
, which reads the next n
bytes. As with text files, binary file objects use a file pointer to keep track of which byte should be read next in the file.
Accessing the individual bytes of binary files is a very low-level operation that, while useful in systems programming, is less useful in higher-level applications programming.
Pickling is often a much more convenient way to deal with binary files. Python’s pickle
module lets you easily read and write almost any data structure. For example:
# picklefile.py import pickle def make_pickled_file(): grades = {'alan' : [4, 8, 10, 10], 'tom' : [7, 7, 7, 8], 'dan' : [5, None, 7, 7], 'may' : [10, 8, 10, 10]} outfile = open('grades.dat', 'wb') pickle.dump(grades, outfile) def get_pickled_data(): infile = open('grades.dat', 'rb') grades = pickle.load(infile) return grades
Essentially, pickling lets you store a data structure on disk using pickle.dump
and then retrieve it later with pickle.load
. This is an extremely useful feature in many application programs, so you should keep it in mind whenever you need to store binary data.
In addition to data structures, pickling can store functions.
You can’t use pickling to read or write binary files that have a specific format, such as GIF files. For such files, you must work byte by byte.
Python has a module called shelve
that provides an even higher-level way to store and retrieve data. The shelve
module essentially lets you treat a file as if it were a dictionary. For more details, see the Python documentation (http://docs.python.org/dev/3.0/library/shelve.html).
Python also has a module named sqlite3
, which provides an interface to the sqlite database. This lets you write SQL commands to store and retrieve data very much like using a larger database product such as Postgres or MySQL. For more details, see the Python documentation (http://docs.python.org/dev/3.0/library/sqlite3.html).
Python has good support for accessing the Web. One common task is to have a program automatically read a Web page. This is easily done using the urllib
module:
>>> import urllib.request >>> resp = urllib.request.urlopen('http://www.python.org') >>> html = resp.read() >>> html[:25] b'<!DOCTYPE html PUBLIC "-/'
Now html
contains the complete text of the Web page at www.python.org. It is in HTML, of course, so it looks just like what you would see if you were to use the View Source option on your Web browser. Since the Web page is now a string on your computer, you can use Python’s string-manipulation functions to extract information from it.
The urllib
module also lets you programmatically post information to Web forms. For details of how to do this and more, see the Python documentation (http://docs.python.org/dev/3.0/howto/urllib2.html).
Reading a Web page into a string is the first step in creating a Web browser. The next major step is to parse the string—to identify and extract titles, paragraphs, tables, and so on. Python provides a basic HTML parsing library in the html.parser
module. See the Python documentation (http://docs.python.org/dev/3.0/library/html.parser.html) for details.
Another nifty module is webbrowser
, which lets you programmatically display a Web page in a browser. For example, when you type this into Python, the Yahoo! Home page should pop up in your default Web browser:
>>> import webbrowser >>> webbrowser.open('http://www.yahoo.com') True >>>
18.219.178.166