Chapter 15
Working with large blocks of text in a file opens up a whole new range of possibilities for interesting programs. The following program solves Jumble puzzles, which ask the solver to figure out an unknown word given a scrambled set of its letters.
One of the difficulties in solving this problem is that long words have so many different reorderings, called permutations. The strategy employed here is to create a unique signature for each word, so that a word has the same signature as all of its permutations. This program also finds anagrams, since typing in an unscrambled word will find all of its anagrams.
Listing 15.1: Solve Jumble
1 # solvejumble.py
2
3 def signature(word):
4 chars = list(word)
5 chars.sort()
6 return chars
7
8 def wordlist(fname):
9 with open(fname) as f:
10 return f.read().split()
11
12 def matches(jumble):
13 sign = signature(jumble)
14 result = []
15 for word in wordlist("dictionary.txt"):
16 if signature(word) == sign:
17 result.append(word)
18 return result
19
20 def main():
21 jumble = input("Enter a scrambled word: ")
22 print("Matches are:", matches(jumble))
23
24 main()
Python makes it particularly easy to access files. A file is a collection of data that is stored by an operating system in such a way that it can be retrieved by its name and location in the directory or folder hierarchy. The operating system hides all details about exactly where the bytes that make up the file are located on the hard disk or other storage medium.
The built-in function open() gives access to files in Python:
open(fname) |
Open file named fname in current directory. |
open(fname, "r") |
Open file for reading (the default). |
open(fname, "w") |
Open file for writing. |
Each of these returns a file object, which you can call file methods on (listed below). Files opened as above will be in text mode. Specifying "rb" or "wb" opens the file in binary mode, meaning that bytes in the file are read “as-is,” without being interpreted as characters.
Because things can go wrong when opening a file—for example, it may be corrupted or not exist—it is important to open files carefully so that errors (raised as exceptions) may be handled gracefully.
The recommended way to open a file in Python is to use a with statement:
with <expression> as <variable>:
<body>
Inside the <body> of a with statement, the <variable> takes on the value of the <expression>. However, much more happens behind the scenes. In this case, the with statement in line 9 makes sure that the file is closed properly after execution of the <body>, even if an exception occurs.
Here are some of the methods that may be used with a file object named f:
f.read() |
Entire contents of f as a string. |
f.readline() |
Next line from f, including the trailing newline. Returns empty string at end of file. |
f.readlines() |
List containing all lines from f. |
f.write(s) |
Write string s to f. |
f.close() |
Close f. Called automatically by with. |
The line-based methods may only be used with text files.
⇒ Caution: Different operating systems use different escape sequences for the newlines at the end of each line of a text file, called trailing newlines:
Operating System |
Escape Sequence |
ASCII (decimal) |
---|---|---|
Unix® family (Linux®, Mac OS® X) |
|
10 |
Microsoft Windows |
|
13, 10 |
Mac OS 9 and earlier |
|
13 |
Fortunately, Python automatically translates newlines so that in code you can always just use .
If you need to loop over all the lines in a file, Python allows you to put the file object directly in a for loop:
for <variable> in <file>: # loop over each line in the file
<body>
This is essentially equivalent to:
for <variable> in <file>.readlines():
<body>
Therefore, if you use a for loop, each line will contain its newline escape sequence.
Strings are also objects in Python, and one of their most useful methods in conjunction with reading files is .split():
s.split() |
List of “words” in string s. |
“Words” in this case are defined as any groups of characters separated by whitespace, which is made up of spaces, tabs, and newlines. For example,
"Hello, world.".split() = ["Hello,", "world."]
Notice that after a .split(), punctuation attaches to the word preceding it.
We will explore other string methods in the next chapter.
It is common in Python to call methods one right after another. For example, consider line 10 of Listing 15.1:
return f.read().split()
Read the method calls left to right: the result of f.read() is a string, and then the .split() method is called on that string.
chars = word.sort()
Explain why or why not.
chars = list(word).sort()
Explain why or why not.
return chars.sort()
Explain why or why not.
return list(word).sort()
Explain why or why not.
Which of these methods appears double-spaced, and which do not?
18.227.134.133