Chapter 8. Input and Output

To be useful, a program needs to communicate with the world around it. It needs to interact with the user, or read and write files, or access Web pages, and so on. In general, we refer to this as input and output, or I/O for short.

We’ve already seen basic console I/O, which involves printing messages and using the input function to read strings from the user. Now we’ll see some string formatting that lets you make fancy output strings for console I/O and anywhere you need a formatted string.

Then we’ll turn to file I/O, which is all about reading and writing files. Python provides a lot of support for basic file I/O, making it as easy as possible for programmers. In particular, we’ll see how to use text files, binary files, and the powerful pickle module.

Formatting Strings

Python provides a number of different ways to create formatted strings. We will discuss the older string interpolation and the newer format strings.

String interpolation

String interpolation is a simple approach to string formatting that Python borrows from the C programming language. For instance, here’s how you can control the number of decimal places in a float:

>>> x = 1/81
>>> print(x)
0.0123456790123
>>> print('value: %.2f' % x)
value: 0.01
>>> print('value: %.5f' % x)
value: 0.01235

String interpolation expressions always have the form format % values, where format is a string containing one or more occurrences of the % character. In the example 'x = %.2f' % x, the substring %.2f is a formatting command that tells Python to take the first supplied value (x) and to display it as a floating point value with two decimal places.

Conversion specifiers

The character f in the format string is a conversion specifier, and it tells Python how to render the corresponding value. Table 8.1 lists the most commonly used conversion specifiers.

Table 8.1. Some Conversion Specifiers

SPECIFIER

MEANING

d

Integer

o

Octal (base 8) value

x

Lowercase hexadecimal (base 16)

X

Uppercase hexadecimal (base 16)

e

Lowercase float exponential

E

Uppercase float exponential

F

Float

s

String

%

% character

The e, E, and f specifiers give you different ways of representing floats. For example:

>>> x
0.012345679012345678
>>> print('x = %f' % x)
x = 0.012346
>>> print('x = %e' % x)
x = 1.234568e-02
>>> print('x = %E' % x)
x = 1.234568E-02

You can put as many specifiers as you need in a format string, although you must supply exactly one value for each specifier. For example:

>>> a, b, c = 'cat', 3.14, 6
>>> s = 'There's %d %ss older than %.2f years' % (c, a, b)
>>> s
"There's 6 cats older than 3.14 years"

As this example shows, the format string acts as a simple template that gets filled in by the values. The values are given in a tuple, and they must be in the order in which you want them replaced.

✓ Tips

  • The d, f, and s conversion specifiers are the most frequently used, so they are the ones worth remembering. In particular, f is the easiest way to control the format of floats.

  • If you need the % character to appear as % itself, then you must type '%%'.

String Formatting

A second way to create fancy strings in Python is to use format strings with the string function format(value, format_spec). For example:

>>> 'My {pet} has {prob}'.format(pet = 'dog', prob='fleas')
'My dog has fleas'

In a format string, anything within curly braces is replaced. This is called named replacement, and it is especially readable in this example.

You can also replace values by position:

>>> 'My {0} has {1}'.format('dog', 'fleas')
'My dog has fleas'

Or apply formatting codes similar to interpolated strings:

>>> '1/81 = {x}'.format(x=1/81)
'1/81 = 0.0123456790123'
>>> '1/81 = {x:f}'.format(x=1/81)
'1/81 = 0.012346'
>>> '1/81 = {x:.3f}'.format(x=1/81)
'1/81 = 0.012'

You can specify formatting parameters within braces, like this:

>>> 'num = {x:.{d}f}'.format(x=1/81, d=3)
'num = 0.012'
>>> 'num = {x:.{d}f}'.format(x=1/81, d=4)
'num = 0.0123'

This is something you can’t do with regular string interpolation.

✓ Tips

  • If you need the { or } characters to appear as themselves in a format string, type them as {{ and }}.

  • Format strings are more flexible and powerful than string interpolation, but also more complicated. If you are creating only a few simple formatted strings, string interpolation is probably the best choice. Otherwise, format strings are more useful for larger and more complex formatting jobs, such as creating Web pages or form letters for e-mail.

Reading and Writing Files

A file is a named collection of bits stored on a secondary storage device, such as a hard disk, USB drive, flash memory stick, and so on. We distinguish between two categories of files: text files, which are essentially strings stored on disk, and binary files, which are everything else.

Text files have the following characteristics:

  • They are essentially “strings on disk.” Python source code files and HTML files are examples of text files.

  • They can be edited with any text editor. Thus, they are relatively easy for humans to read and modify.

  • They tend to be difficult for programs to read. Typically, relatively complex programs called parsers are needed to read each different kind of text file. For instance, Python uses a special-purpose parser to help read .py files, while HTML parsers are used to read HTML files.

  • They are usually larger than equivalent binary files. This can be a major problem when, for instance, you need to send a large text file over the Internet. Thus, text files are often compressed (for example, into zip format) to speed up transmission and to save disk space.

Binary files have the following characteristics:

  • They are not usually human-readable, at least within a regular text editor. A binary file is displayed in a text editor as a random-looking series of characters. Some kinds of binary files, such as JPEG image files, have special viewers for displaying their content.

  • They usually take up less space than equivalent text files. For instance, a binary file might group the information within it in chunks of 32 bits without using commas, spaces, or any kind of separator character.

  • They are often easier to read and write than text files. While each binary file is different, it’s often not necessary to write complex parsers to read them.

  • They are often tied to a specific program and are often unusable if you lack that program. Some popular binary files may have their file formats published so that you can, if so motivated, write your programs to read and write them. However, this usually requires substantial effort.

Folders

In addition to files, folders (or directories) are used to store files and other folders. The folder structure of most file systems is quite large and complex, forming a hierarchical folder structure.

A pathname is the name used to identify a file or a folder. The full pathname can be quite long. For example, the Python folder on my Windows computer has this full pathname: C:Documents and Settings jdDesktoppython.

Windows pathnames use a backward slash () character to separate names in a path, and they start with the letter of the disk drive (in this example, C:).

On Mac and Linux systems, a forward slash (/) is used to separate names. Plus, there is no drive letter at the start. For example, here is the full pathname for my Python folder on Linux: /home/tjd/Desktop/python.

✓ Tips

  • Recall that if you want to write a character in a Python string, it must be doubled:

    'C:\home\tjd\Desktop\python'

    To avoid the double backslashes, you can use a raw string:

    r'C:home	jdDesktoppython'
  • Getting Python programs to work with both styles of pathnames is a bit tricky, and you should read the documentation for Python’s os.path module for (much!) more information.

The current working directory

Many programs use the idea of a current working directory, or cwd. This is simply one directory that has been designated as the default directory: Whenever you do something to a file or a folder without providing a full pathname, Python assumes you mean a file or a folder in the current working directory.

Examining Files and Folders

Python provides many functions that return information about your computer’s files and folders (its file system). Table 8.2 lists a few of the most useful ones.

Table 8.2. Useful File and Folder Functions

NAME

ACTION

os.getcwd()

Returns the name of the current working directory

os.listdir(p)

Returns a list of strings of the names of all the files and folders in the folder specified by path p

os.chdir(p)

Sets the current working directory to be path p

os.path.isfile(p)

Returns True just when path p specifies the name of a file, and False otherwise

os.path.isdir(p)

Returns True just when path p specifies the name of a folder, and False otherwise

os.stat(fname)

Returns information about fname, such as its size in bytes and the last modification time

Let’s write a couple of useful functions to see how these work. For instance, a common task is retrieving the files and folders in the current working directory. Writing os.listdir(os.getcwd()) is unwieldy, so we can write this function:

# list.py
def list_cwd():
    return os.listdir(os.getcwd())

The following two related helper functions use list comprehensions to return just the files and folders in the current working directory:

# list.py
def files_cwd():
    return [p for p in list_cwd()
            if os.path.isfile(p)]

def folders_cwd():
    return [p for p in list_cwd()
            if os.path.isdir(p)]

If you just want a list of, say, the .py files in the current working directory, then this will work:

# list.py
def list_py(path = None):
    if path == None:
        path = os.getcwd()
    return [fname for fname in os.listdir(path)
            if os.path.isfile(fname)
            if fname.endswith('.py')]

This function plays a useful trick with its input parameter: If you call list_py() without a parameter, it runs on the current working directory. Otherwise, it runs on the directory you pass in.

Finally, here’s a function that returns the sum of the sizes of the files in the current working directory:

# list.py
def size_in_bytes(fname):
    return os.stat(fname).st_size

def cwd_size_in_bytes():
    total = 0
    for name in files_cwd():
        total = total + size_in_bytes(name)
    return total

✓ Tips

  • To save space, we’ve removed the doc strings for these functions. However, the supplementary code files on Google’s “pythonintro” Web site (http://pythonintro.googlecode.com) all include doc strings.

  • You can tell from the name cwd_size_in_bytes that the return value will be in bytes. Putting the unit of the return value in the function name prevents the need to check the documentation for the units.

  • In general, it’s a good idea to use lots of functions. Even single-line functions such as list_dir() are useful because they make your programs easier to read and maintain.

  • The os.stat() function is fairly complex and provides much more information about files than we’ve shown here. Check Python’s online documentation for more information (http://docs.python.org/dev/3.0/library/os.html).

Processing Text Files

Python makes it relatively easy to process text files. In general, file processing follows the three steps shown in Figure 8.1.

The three main steps for processing a text file. A file must be opened before you can use it, and then it should be closed when you are done with it to ensure that all changes are committed to the file.

Figure 8.1. The three main steps for processing a text file. A file must be opened before you can use it, and then it should be closed when you are done with it to ensure that all changes are committed to the file.

Reading a text file, line by line

Perhaps the most common way of reading a text file is to read it one line at a time. For example, this prints the contents of a file to the screen:

# printfile.py
def print_file1(fname):
    f = open(fname, 'r')
    for line in f:
        print(line, end = '')
    f.close()  # optional

The first line of the function opens the file: open requires the name of the file you want to process, and also the mode you want it opened in. We are only reading the file, so we open the file in read mode 'r'. Table 8.3 lists Python’s main file modes.

Table 8.3. Python File Modes

CHARACTER

MEANING

'r'

Open for reading (default)

'w'

Open for writing

'a'

Open for appending to the end of the file

'b'

Binary mode

't'

Text mode (default)

'+'

Open a file for reading and writing

The open function returns a special file object, which represents the file on disk. Importantly, open does not read the file into RAM. Instead, in this example, the file is read a line at a time using a for-loop.

The last line of print_file1 closes the file. As the comment notes, this is optional: Python almost always automatically closes files for you. In this case, variable f is local to print_file1, so when print_file1 ends, Python automatically closes and then deletes the file object (not the file itself, of course!) that f points to.

✓ Tips

  • The print statement in print_file1 sets end = '' because the lines of a file always end with a character. Thus if we had written just print(line), the file would be displayed with extra blank lines (try it and see!).

  • If errors occur while a file is open, it is possible that the program could end without the file being properly closed. In the next chapter, we will see how to handle such errors and ensure that a file is always correctly closed.

Reading a text file as a string

Another common way of reading a text file is to read it as one big string. For example:

# printfile.py
def print_file2(fname):
    f = open(fname, 'r')
    print(f.read())
    f.close()

This is shorter and simpler than print_file1, so many programmers prefer it. However, if the file you are reading is very large, it will take up a lot of RAM, which could slow down, or even crash, your computer.

Finally, we note that many programmers would write this function with a single-line body:

# printfile.py
def print_file3(fname):
    print(open(fname, 'r').read())

While this more compact form might take some getting used to, many programmers like this style because it is both quick to type and still relatively readable.

Writing to a text file

Writing text files is only a little more involved than reading them. For example, this function creates a new text file named story.txt:

# write.py
def make_story1():
    f = open('story.txt', 'w')
    f.write('Mary had a little lamb,
')
    f.write('and then she had some more.
')

The 'w' tells Python to open the file in write mode. To put text into the file, you call f.write with the string you want to put into the file. Strings are written to the file in the order in which they are given.

Important: If story.txt already exists, then calling open('story.txt', 'w') will delete it! If you want to avoid overwriting story.txt, you need to check to see if it exists:

# write.py
import os
def make_story2():
    if os.path.isfile('story.txt'):
        print('story.txt already exists)
    else:
        f = open('story.txt', 'w')
        f.write('Mary had a little lamb,
')
        f.write('and then she had some more.
')

Appending to a text file

One common way of adding strings to a text file is to append them to the end of the file. Unlike 'w' mode, this does not delete anything that might already be in the file. For example:

def add_to_story(line,
                 fname = 'story.txt'):
    f = open(fname, 'a')
    f.write(line)

The important thing to note here is that the file is opened in append mode 'a'.

Inserting a string at the start of a file

Writing a string to the beginning of a file is not as easy as appending one to the end because the Windows, Linux, and Macintosh operating systems don’t directly support inserting text at the beginning of a text file. Perhaps the simplest way to insert text at the beginning of a file is to read the file into a string, insert the new text into the string, and then write the string back to the original file. For example:

def insert_title(title,
                 fname = 'story.txt'):
    f = open(fname, 'r+')

    temp = f.read()
    temp = title + '

' + temp

    f.seek(0)  # reset file pointer
               # to beginning
    f.write(temp)

First, notice that we open the file using the special mode 'r+', which means the file can be both read from and written to. Then we read the entire file into the string temp and insert the title using string concatenation.

Before writing the newly created string back into the file, we first have to tell the file object f to reset its internal file pointer. All text file objects keep track of where they are in the file, and after f.read() is called, the file pointer is at the very end. Calling f.seek(0) puts it back at the start of the file, so that when we write to f, it begins at the start of the file.

Processing Binary Files

If a file is not a text file, then it is considered to be a binary file. Binary files are opened in 'b' mode, and you access the individual bytes of the file. For example:

def is_gif(fname):
    f = open(fname, 'br')
    first4 = tuple(f.read(4))
    return first4 == (0x47, 0x49, 0x46,
                      0x38)

This function tests if fname is a GIF image file by checking to see if its first 4 bytes are (0x47, 0x49, 0x46, 0x38) (all GIFs must start with those 4 bytes).

In Python, numbers like 0x47 are base-16 hexadecimal numbers, or hex for short. They are very convenient for dealing with bytes, since each hexadecimal digit corresponds to a pattern of 4 bits, and so 1 byte can be described using two hex digits (such as 0x47).

Notice that the file is opened in 'br' mode, which means binary reading mode. When reading a binary file, you call f.read(n), which reads the next n bytes. As with text files, binary file objects use a file pointer to keep track of which byte should be read next in the file.

Pickling

Accessing the individual bytes of binary files is a very low-level operation that, while useful in systems programming, is less useful in higher-level applications programming.

Pickling is often a much more convenient way to deal with binary files. Python’s pickle module lets you easily read and write almost any data structure. For example:

# picklefile.py
import pickle

def make_pickled_file():
    grades = {'alan' : [4, 8, 10, 10],
              'tom' : [7, 7, 7, 8],
              'dan' : [5, None, 7, 7],
              'may' : [10, 8, 10, 10]}

    outfile = open('grades.dat', 'wb')
    pickle.dump(grades, outfile)

def get_pickled_data():
    infile = open('grades.dat', 'rb')
    grades = pickle.load(infile)
    return grades

Essentially, pickling lets you store a data structure on disk using pickle.dump and then retrieve it later with pickle.load. This is an extremely useful feature in many application programs, so you should keep it in mind whenever you need to store binary data.

✓ Tips

  • In addition to data structures, pickling can store functions.

  • You can’t use pickling to read or write binary files that have a specific format, such as GIF files. For such files, you must work byte by byte.

  • Python has a module called shelve that provides an even higher-level way to store and retrieve data. The shelve module essentially lets you treat a file as if it were a dictionary. For more details, see the Python documentation (http://docs.python.org/dev/3.0/library/shelve.html).

  • Python also has a module named sqlite3, which provides an interface to the sqlite database. This lets you write SQL commands to store and retrieve data very much like using a larger database product such as Postgres or MySQL. For more details, see the Python documentation (http://docs.python.org/dev/3.0/library/sqlite3.html).

Reading Web Pages

Python has good support for accessing the Web. One common task is to have a program automatically read a Web page. This is easily done using the urllib module:

>>> import urllib.request
>>> resp = urllib.request.urlopen('http://www.python.org')
>>> html = resp.read()
>>> html[:25]
b'<!DOCTYPE html PUBLIC "-/'

Now html contains the complete text of the Web page at www.python.org. It is in HTML, of course, so it looks just like what you would see if you were to use the View Source option on your Web browser. Since the Web page is now a string on your computer, you can use Python’s string-manipulation functions to extract information from it.

✓ Tips

  • The urllib module also lets you programmatically post information to Web forms. For details of how to do this and more, see the Python documentation (http://docs.python.org/dev/3.0/howto/urllib2.html).

  • Reading a Web page into a string is the first step in creating a Web browser. The next major step is to parse the string—to identify and extract titles, paragraphs, tables, and so on. Python provides a basic HTML parsing library in the html.parser module. See the Python documentation (http://docs.python.org/dev/3.0/library/html.parser.html) for details.

  • Another nifty module is webbrowser, which lets you programmatically display a Web page in a browser. For example, when you type this into Python, the Yahoo! Home page should pop up in your default Web browser:

    >>> import webbrowser
    >>> webbrowser.open('http://www.yahoo.com')
    True
    >>>
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.178.166