Chapter 16

String Methods

Files are not limited to those on your hard drive. The following program retrieves a web page from the Internet, and then uses Python’s string methods to display specific information from it.

Listing 16.1: GCS Menu

 1 # menu.py
 2
 3 import urllib.request
 4
 5 URL = "http://www.central.edu/go/gcsmenu"
 6
 7 def getpage(url):
 8 with urllib.request.urlopen(url) as f:
 9  return str(f.read())
10
11 def gettag(page, tag, start=0):
12 opentag = "<" + tag + ">"
13 closetag = "</" + tag + ">"
14 i = page.find(opentag, start)
15 if i == −1:
16  return None, i
17 j = page.find(closetag, i)
18 return page[i + len(opentag):j], j
19
20 def process(page):
21 heading, i = gettag(page, "h2")
22 result = "
" + heading.center(60) + "

"
23 day, i = gettag(page, "h3", i)
24 while day is not None:
25  result += day + "
"
26  meal, i = gettag(page, "p", i)
27  result += " " + meal.strip("<>p") + "
"
28  day, i = gettag(page, "h3", i)
29 return result
30
31 def main():
32 page = getpage(URL)
33 print(process(page))
34
35 main()

Visit the URL in a web browser and use “View Page Source” to see the raw contents of the page. Listing 16.1 searches for data contained in HTML tags such as <h2>Grand Central Station Menu</h2>.

In addition to the new string methods, this example uses the is comparison, multiple assignment and return, constants, a different version of the import statement, and the urllib.request module.

String Methods

Python provides many string methods. A few are highlighted here, but see the documentation for a complete list. These search within a string s:

s.find(t[, start[, end]])

First index where t is a substring in

s[start:end].

s.rfind(t[, start[, end]])

Last index where t is a substring in

s[start:end].

Return −1 if not found. Optional start, end work like slices.

s.startswith(t)

True if t is a prefix of s.

s.endswith(t)

True if t is a suffix of s.

s.count(t)

Number of occurrences of substring t

in s.

Square brackets in syntax descriptions, such as those used above, indicate optional elements. With the .find() methods, because there are two sets of brackets, there are actually three options:

s.find(t)

# Search s

s.find(t, i)

# Search s[i]

s.find(t, i, j)

# Search s[i:j]

These test the contents of s:

s.isalpha()

True if all characters in s are alphabetic.

s.isupper()

True if all characters in s are uppercase.

s.islower()

True if all characters in s are lowercase.

s.isdigit()

True if all characters in s are digits.

These each return a modified copy of s:

s.upper()

All uppercase.

s.lower()

All lowercase.

s.capitalize()

First letter capitalized and the rest lowercase.

s.title()

Each word capitalized, rest lowercase.

s.replace(old, new)

All occurrences of substring old replaced

by new.

s.center(width)

Centered in a string of width width.

s.strip([chars])

All chars removed from both ends.

s.lstrip([chars])

All chars removed from left end.

s.rstrip([chars])

All chars removed from right end.

If optional chars omitted, whitespace is removed.

⇒ Caution: None of these methods changes the original string; they modify and return a copy.

The is Comparison

Recall that Python has the special value None to represent “nothing” or “no object.” The gettag() function in Listing 16.1 uses None to indicate that a tag was not found. The proper way to test for None in the calling function on line 24 is with an is comparison:

obj1 is obj2

True if obj1 and obj2 refer to the same object.

obj1 is not obj2

Opposite of is.

The is comparison never checks the value of its object references; it only checks whether or not the references refer to precisely the same object.

Multiple Assignment and Return

Python allows you to assign values to more than one variable at a time, as long as the number of values on the right side of the equals sign is the same as the number of variables on the left:

<var1>, <var2>, ..., <varN> = <expr1>, <expr2>, ..., <exprN>

In a multiple assignment, all expressions on the right-hand side are evaluated before assigning values to the variables on the left side, and the assignments are considered to happen simultaneously.

Recall from Chapter 6 that a function may return more than one value by separating the values with commas, as on line 18 of Listing 16.1. Each call to gettag() in the process() function of Listing 16.1 shows how multiple assignment can be used to store multiple return values. Technically, multiple values are returned in a tuple, defined later in Chapter 19.

Constants

Occasionally, it is helpful to create a variable for a value that will never change. Such variables are called constants, and in Python they are usually written in all capitals with underscores between words. Listing 16.1 defines the constant URL in line 5.

Modules may include names that represent constants in addition to functions. For example, the math module includes the constant pi. These module constants do not use the all-caps naming convention.

The string module provides several constants, including:

punctuation

String of all punctuation characters.

Import without From

A second form of import is used in Listing 16.1:

import <module>

After this statement, any name from the module may be referred to with dot notation:

<module>.<name>

One advantage of this form is that all module names are made available without having to list them. A second is that every use of a module name is easy to find because of the dot notation; for example, see line 8. Finally, this syntax prevents accidentally hiding the same name either as a built-in function or from another module. For these reasons, production code usually uses this form, and we will use it frequently throughout the remainder of the text.

Accessing Web Pages

The urllib.request module provides support for reading web pages. Every page on the web is described by its URL or uniform resource locator, essentially its address on the web. Listing 16.1 uses the following function from urllib.request:

urllib.request.urlopen(url)

File-like object accessing url.

The file-like object that is returned supports a read() method, but this read() method returns raw bytes rather than text. The str() type conversion in line 9 converts the bytes to a string.

Exercises

  1. 16.1 Use Listing 16.1 to:
    1. (a) Identify the accumulation loop in Listing 16.1, along with the type of accumulator. Explain how you know the type.
    2. (b) Explain why no else is necessary for the if in line 15.
    3. (c) Rewrite the import and with statements to use import from.
  2. 16.2 Use Listing 16.1 to answer these questions:
    1. (a) Does the program work correctly?
    2. (b) Explain why the previous question is meaningful in a way that is different from all previous code examples.
    3. (c) Determine whether or not the .strip() in line 27 is necessary. Explain your conclusion.
  3. 16.3 Use Listing 16.1 to answer these questions:
    1. (a) Explain why it is helpful for gettag() to return two values instead of one.
    2. (b) Explain how the slice works in line 18 of gettag().
  4. 16.4 Although lowercase is preferable, HTML tags may be uppercase, such as <H2>. Modify Listing 16.1 to handle tags of either case.
  5. 16.5 Modify Listing 16.1 to highlight foods or days you are interested in.
  6. 16.6 Modify Listing 16.1 to display information from your own campus or town.
  7. 16.7 Modify Listing 16.1 to extract information from a web page of your choice.
  8. 16.8 Write a program weather.py to print the current weather conditions from the National Weather Service. Ask the user for the zip code, and append their response to the URL http://forecast.weather.gov/zipcity.php?inputstring=.
  9. 16.9 Write a function removepunc(s) that returns a copy of the string s with all punctuation removed.
  10. 16.10 Look up the string .translate() and .maketrans() methods and use them to rewrite the previous exercise.
  11. 16.11 Write a function alphaonly(s) that returns a copy of s that retains only alphabetic characters.
  12. 16.12 Write the function mycapitalize(s) to return a copy of s that is capitalized, without using the string .capitalize() method. Test your function by comparing it with the built-in method.
  13. 16.13 Write the function mytitle(s) to return a copy of s with each word capitalized, without using the string .title() method. Test your function by comparing it with the built-in method.
  14. 16.14 Rewrite the is_dna() function from Exercise 12.4 to handle both upper and lower case.
  15. 16.15 Rewrite the is_rna() function from Exercise 12.6 to handle both upper and lower case.
  16. 16.16 Rewrite the transcription() function from Exercise 12.7 using string methods.
  17. 16.17 Rewrite the countbases() function from Exercise 12.9 using string methods.
  18. 16.18 Look up the string .translate() and .maketrans() methods and use them to rewrite the complement() function from Listing 12.1. Use a constant for the string of nucleotides "ACGT".
  19. 16.19 Write a function acronym(phrase) that returns the acronym for phrase. For example, if the phrase is “random access memory,” then the acronym is “RAM;” if the phrase is “as soon as possible,” the acronym is “ASAP.” Write a main() to test your code.
  20. 16.20 Write a function isidentifier(s) that returns True if s is a legal Python identifier; otherwise, it returns False. Look up the keyword module in the documentation and use it to exclude keywords. Write a main() to test your function.
  21. 16.21 Write a function ispalindrome(s) that returns True if s is a palindrome; i.e., a phrase that reads the same backward as forward, excluding any punctuation or whitespace. Write a main() to test your code.
  22. 16.22 Write a program that finds the longest palindrome in a dictionary.
  23. 16.23 Listing 15.1 assumes that the dictionary file and word are all in the same case (upper or lower). Modify the program to remove this assumption and work for dictionaries and words that use any case.
  24. 16.24 Modify Exercise 15.3 to use string methods in order to get correct output.
  25. 16.25 Modify Exercise 15.9 to ignore punctuation in its calculation.
  26. 16.26 Write a program caesar.py that encodes and decodes text using a Caesar cipher. A Caesar cipher shifts each letter in the message by a fixed number of steps. For example, with a shift of n = 2 steps, every “a” becomes a “c,” every “b” becomes a “d,” and so on, with “z” wrapping around to “b.”

    Write encode(msg, n) and decode(msg, n) functions, and test your program by decoding encoded messages and checking that the results are the same as the originals. Maintain case within messages (so that uppercase stays upper and lowercase stays lower), and handle punctuation appropriately.

  27. 16.27 Write a program piglatinfile.py that asks the user for the name of a file and then translates the entire file into Pig Latin. For extra credit, handle punctuation appropriately. Try your program on books from Project Gutenberg (http://www.gutenberg.org/). You may need to run this program from the command line for large files (see page 110).
  28. 16.28 Write a program spellcheck.py that asks the user for the name of a file and then looks up every word in that file to see if it is in a given dictionary. Print out the words that are not in the dictionary. Handle punctuation appropriately. Warning: this program will take a long time with large files.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.132.107