Techniques for Reading Files

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Techniques for Reading Files

As we mentioned at the beginning of the chapter, Python provides several techniques for reading files. You’ll learn about them in this section.

All of these techniques work starting at the current file cursor. That allows us to combine the techniques as we need to.

The Read Technique

Use this technique when you want to read the contents of a file into a single string, or when you want to specify exactly how many characters to read. This technique was introduced in Opening a File; here is the same example:

	with open('file_example.txt', 'r') as file:
	contents = file.read()

	print(contents)

When called with no arguments, method read reads everything from the current file cursor all the way to the end of the file and moves the file cursor to the end of the file. When called with one integer argument, it reads that many characters and moves the file cursor after the characters that were just read. Here is a version of the same program in a file called file_reader_with_10.py; it reads ten characters and then the rest of the file:

	with open('file_example.txt', 'r') as example_file:
	first_ten_chars = example_file.read(10)
	the_rest = example_file.read()

	print("The first 10 characters:", first_ten_chars)
	print("The rest of the file:", the_rest)

Method call example_file.read(10) moves the file cursor, so the next call, example_file.read(), reads everything from character 11 to the end of the file.

The Readlines Technique

Use this technique when you want to get a Python list of strings containing the individual lines from a file. Function readlines works much like function read, except that it splits up the lines into a list of strings. As with read, the file cursor is moved to the end of the file.

This example reads the contents of a file into a list of strings and then prints that list:

	with open('file_example.txt', 'r') as example_file:
	lines = example_file.readlines()

	print(lines)

Here is the output:

['First line of text. ', 'Second line of text. ', 'Third line of text. ']

Take a close look at that list; you’ll see that each line ends in characters. Python does not remove any characters from what is read; it only splits them into separate strings.

The last line of a file may or may not end with a newline character, as you learned in Exploring String Methods.

Assume file planets.txt contains the following text:

	Mercury
	Venus
	Earth
	Mars

This example prints the lines in planets.txt backward, from the last line to the first (here, we use built-in function reversed, which returns the items in the list in reverse order):

	>>> with open('planets.txt', 'r') as planets_file:
	... planets = planets_file.readlines()
	...
	>>> planets
	['Mercury ', 'Venus ', 'Earth ', 'Mars ']
	>>> for planet in reversed(planets):
	... print(planet.strip())
	...
	Mars
	Earth
	Venus
	Mercury

We can use the Readlines technique to read the file, sort the lines, and print the planets alphabetically (here, we use built-in function sorted, which returns the items in the list in order from smallest to largest):

	>>> with open('planets.txt', 'r') as planets_file:
	... planets = planets_file.readlines()
	...
	>>> planets
	['Mercury ', 'Venus ', 'Earth ', 'Mars ']
	>>> for planet in sorted(planets):
	... print(planet.strip())
	...
	Earth
	Mars
	Mercury
	Venus

The “For Line in File” Technique

Use this technique when you want to do the same thing to every line from the file cursor to the end of a file. On each iteration, the file cursor is moved to the beginning of the next line.

This code opens file planets.txt and prints the length of each line in that file:

	>>> with open('planets.txt', 'r') as data_file:
	... for line in data_file:
	... print(len(line))
	...
	8
	6
	6
	5

Take a close look at the last line of output. There are only four characters in the word Mars, but our program is reporting that the line is five characters long. The reason for this is the same as for function readlines: each of the lines we read from the file has a newline character at the end. We can get rid of it using string method strip, which returns a copy of a string that has leading and trailing whitespace characters (spaces, tabs, and newlines) stripped away:

	>>> with open('planets.txt', 'r') as data_file:
	... for line in data_file:
	... print(len(line.strip()))
	...
	7
	5
	5
	4

The Readline Technique

This technique reads one line at a time, unlike the Readlines technique. Use this technique when you want to read only part of a file.

For example, you might want to treat lines differently depending on context; perhaps you want to process a file that has a header section followed by a series of records, either one record per line or with multiline records.

The following data, taken from the Time Series Data Library [Hyn06], describes the number of colored fox fur pelts produced in Hopedale, Labrador, in the years 1834–1842. (The full data set has values for the years 1834–1925.)

	Coloured fox fur production, HOPEDALE, Labrador, 1834-1842
	#Source: C. Elton (1942) "Voles, Mice and Lemmings", Oxford Univ. Press
	#Table 17, p.265--266
	22
	29
	2
	16
	12
	35
	8
	83
	166

The first line contains a description of the data. The next two lines contain comments about the data, each of which begins with a # character. Each piece of actual data appears on a single line.

We’ll use the Readline technique to skip the header, and then we’ll use the For Line in File technique to process the data in the file, counting how many fox fur pelts were produced.

	with open('hopedale.txt', 'r') as hopedale_file:

	# Read and skip the description line.
	hopedale_file.readline()

	# Keep reading and skipping comment lines until we read the first piece
	# of data.
	data = hopedale_file.readline().strip()
	while data.startswith('#'):
	data = hopedale_file.readline().strip()

	# Now we have the first piece of data. Accumulate the total number of
	# pelts.
	total_pelts = int(data)

	# Read the rest of the data.
	for data in hopedale_file:
	total_pelts = total_pelts + int(data.strip())

	print("Total number of pelts:", total_pelts)

And here is the output:

Total number of pelts: 373

Each call on the function readline moves the file cursor to the beginning of the next line.

Sometimes leading whitespace is important and you’ll want to preserve it. In the Hopedale data, for example, the integers are right-justified to make them line up nicely. In order to preserve this, you can use rstrip instead of strip to remove the trailing newline; here is a program that prints the data from that file, preserving the whitespace:

	with open('hopedale.txt', 'r') as hopedale_file:

	# Read and skip the description line.
	hopedale_file.readline()

	# Keep reading and skipping comment lines until we read the first piece
	# of data.
	data = hopedale_file.readline().rstrip()
	while data.startswith('#'):
	data = hopedale_file.readline().rstrip()

	# Now we have the first piece of data.
	print(data)

	# Read the rest of the data.
	for data in hopedale_file:
	print(data.rstrip())

And here is the output:

	22
	29
	2
	16
	12
	35
	8
	83
	166

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Techniques for Reading Files

Create new playlist

Sign In

Sign Up

Techniques for Reading Files

The Read Technique

The Readlines Technique

The “For Line in File” Technique

The Readline Technique

Table of Contents for
Techniques for Reading Files