Chapter 15

Files

Working with large blocks of text in a file opens up a whole new range of possibilities for interesting programs. The following program solves Jumble puzzles, which ask the solver to figure out an unknown word given a scrambled set of its letters.

One of the difficulties in solving this problem is that long words have so many different reorderings, called permutations. The strategy employed here is to create a unique signature for each word, so that a word has the same signature as all of its permutations. This program also finds anagrams, since typing in an unscrambled word will find all of its anagrams.

Listing 15.1: Solve Jumble

 1 # solvejumble.py
 2
 3 def signature(word):
 4 chars = list(word)
 5 chars.sort()
 6 return chars
 7
 8 def wordlist(fname):
 9 with open(fname) as f:
10  return f.read().split()
11
12 def matches(jumble):
13 sign = signature(jumble)
14 result = []
15 for word in wordlist("dictionary.txt"):
16  if signature(word) == sign:
17    result.append(word)
18 return result
19
20 def main():
21 jumble = input("Enter a scrambled word: ")
22 print("Matches are:", matches(jumble))
23
24 main()

Files

Python makes it particularly easy to access files. A file is a collection of data that is stored by an operating system in such a way that it can be retrieved by its name and location in the directory or folder hierarchy. The operating system hides all details about exactly where the bytes that make up the file are located on the hard disk or other storage medium.

The built-in function open() gives access to files in Python:

open(fname)

Open file named fname in current directory.

open(fname, "r")

Open file for reading (the default).

open(fname, "w")

Open file for writing.

Each of these returns a file object, which you can call file methods on (listed below). Files opened as above will be in text mode. Specifying "rb" or "wb" opens the file in binary mode, meaning that bytes in the file are read “as-is,” without being interpreted as characters.

Because things can go wrong when opening a file—for example, it may be corrupted or not exist—it is important to open files carefully so that errors (raised as exceptions) may be handled gracefully.

With Statements

The recommended way to open a file in Python is to use a with statement:

with <expression> as <variable>:
  <body>

Inside the <body> of a with statement, the <variable> takes on the value of the <expression>. However, much more happens behind the scenes. In this case, the with statement in line 9 makes sure that the file is closed properly after execution of the <body>, even if an exception occurs.

File Methods

Here are some of the methods that may be used with a file object named f:

f.read()

Entire contents of f as a string.

f.readline()

Next line from f, including the trailing newline. Returns empty string at end of file.

f.readlines()

List containing all lines from f.

f.write(s)

Write string s to f.

f.close()

Close f. Called automatically by with.

The line-based methods may only be used with text files.

⇒ Caution: Different operating systems use different escape sequences for the newlines at the end of each line of a text file, called trailing newlines:

Operating System

Escape Sequence

ASCII (decimal)

Unix® family (Linux®, Mac OS® X)

10

Microsoft Windows

13, 10

Mac OS 9 and earlier

13

Fortunately, Python automatically translates newlines so that in code you can always just use .

File Loops

If you need to loop over all the lines in a file, Python allows you to put the file object directly in a for loop:

for <variable> in <file>: # loop over each line in the file
 <body>

This is essentially equivalent to:

for <variable> in <file>.readlines():
 <body>

Therefore, if you use a for loop, each line will contain its newline escape sequence.

Splitting Strings

Strings are also objects in Python, and one of their most useful methods in conjunction with reading files is .split():

s.split()

List of “words” in string s.

“Words” in this case are defined as any groups of characters separated by whitespace, which is made up of spaces, tabs, and newlines. For example,

"Hello, world.".split() = ["Hello,", "world."]

Notice that after a .split(), punctuation attaches to the word preceding it.

We will explore other string methods in the next chapter.

Multiple Method Calls

It is common in Python to call methods one right after another. For example, consider line 10 of Listing 15.1:

return f.read().split()

Read the method calls left to right: the result of f.read() is a string, and then the .split() method is called on that string.

Exercises

  1. 15.1 Use Listing 15.1 to answer these questions:
    1. (a) Identify the accumulation loop in Listing 15.1, along with the type of accumulator. Explain how you know the type.
    2. (b) What type of object is the loop in line 15 over? Explain how you know.
    3. (c) Rewrite the multiple method call in line 10 to use two separate steps.
  2. 15.2 Answer these questions about the signature() function in Listing 15.1:
    1. (a) Give the signature that is calculated for each of these words: file, string, python.
    2. (b) Does it work to replace lines 4 and 5 with this one method call:
       chars = word.sort()

      Explain why or why not.

    3. (c) Does it work to combine lines 4 and 5 into a multiple method call like this:
       chars = list(word).sort()

      Explain why or why not.

    4. (d) Does it work to replace lines 5 and 6 with
       return chars.sort()

      Explain why or why not.

    5. (e) Does it work to replace the entire function body (lines 4–6) with
       return list(word).sort()

      Explain why or why not.

  3. 15.3 Write a program printfile.py that asks for the name of a file and then prints the contents of that file. Write it in four different ways:
    1. (a) Using read()
    2. (b) Using readlines()
    3. (c) Using readline()
    4. (d) Using a file for loop

    Which of these methods appears double-spaced, and which do not?

  4. 15.4 Fix the double-spacing in the previous exercise by:
    1. (a) Using the end= option with your print() statements
    2. (b) Using a slice to remove the trailing newlines
  5. 15.5 Write a program reverse.py that prints the lines of a file in reverse order.
  6. 15.6 Write a program wotd.py that prints a random word of the day from a dictionary.
  7. 15.7 Write a program wc.py that works like the Unix wc program: given a file name, it prints the number of lines, number of words, and number of characters in the file. Ask the user for the file name.
  8. 15.8 Write a program cp.py that works like the Unix cp command: given two file names, it copies the first file (source) to the second (destination). Ask the user for the names of the source and destination.
  9. 15.9 Write a program avgword.py that computes the average word length for all the words in a file. Do not worry about punctuation, and ask the user for the file name.
  10. 15.10 Write a program linenum.py that asks for the name of a file and then prints the contents of the file with line numbers along the left edge.
  11. 15.11 Write the Python function decode(msg) that undoes the encoding from Exercise 13.6.
  12. 15.12 Modify Listing 15.1 to find the words that have the most anagrams. Warning: this program will take a long time to run. However, see Exercise 18.11.
  13. 15.13 Assign each letter a value based on its order in the alphabet, so that a = 1, b = 2, c = 3, and so on. A word’s value is then the sum of the values of its letters. Write a program to find the word with the highest value.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.134.133