Chapter 18

Dictionaries

Imagine writing a program that counts the number of times each word appears in a large file. We could store the counts in a list, but then how do we keep track of which word uses which index? We could store the words in a separate second list, so that the count for the word in words[i] is in counts[i], but then we would have to search the entire word list every time we came to another word to count. A dictionary is much more efficient.

Listing 18.1: Word Frequency

 1 # wordfreq.py
 2
 3 import string
 4
 5 def getwords(fname):
 6 with open(fname) as f:
 7  s = f.read().lower()
 8 s = s.translate(s.maketrans("", "", string.punctuation))
 9 return s.split()
10
11 def frequency(words):
12 d = {}
13 for word in words:
14  if word in d:
15    d[word] += 1
16  else:
17    d[word] = 1
18 return d
19
20 def display(d):
21 for key in sorted(d):
22  print(key, d[key])
23
24 def main():
25 words = getwords("moby.txt")
26 display(frequency(words))
27
28 main()

The display() function in Listing 18.1 is an exception to our habit of printing only in main(). However, its name alerts readers that it will produce output.

Command Line Execution

This program may run more slowly inside IDLE than at the command line. To run a Python program from the command line in Windows, open a command prompt, use cd to navigate to the location of wordfreq.py, and type this, changing “32” to whatever version you have installed:

 c:python32python wordfreq.py

In Linux or OS X, open a Terminal, use cd to locate wordfreq.py, and run

 python wordfreq.py

Depending on your installation, you may need to use python3 instead of python.

Dictionaries

A dictionary is a mapping that stores keys with associated values. For every key in a dictionary, there is exactly one value. Thus, we think of dictionaries as sets of “key:value” pairs. To look up data in the dictionary, you provide the key, and the dictionary will provide the value.

Dictionaries are written inside curly brackets {}, with a colon between each key-value pair. For example, in this dictionary:

 ages = {"Alice":38, "Bob":39, "Chuck":35, "Dave":34}

the keys are the string character names, and each name is associated to one integer value. A pair of empty brackets {} denotes an empty dictionary.

Dictionaries are also called associative maps because they associate keys with values. Key-value pairs are not stored in any particular order in a dictionary, and so dictionaries support different operations and methods than the sequence types we have been using, such as strings, lists, and range() objects.

Dictionary keys must be immutable, but there is no requirement that all keys have the same type. Values in a dictionary may be of any type. Dictionaries themselves are mutable objects in Python.

Dictionary Operations

The main operations in a dictionary d are to either set or retrieve the value stored for a key:

d[key] = value

Set value associated with key in d to value.

d[key]

Get value associated with key in d.

Raises KeyError exception if key not in d.

Dictionaries are specifically designed so that the above operations are very fast.

Some of the other operations available are:

len(d)

Number of items in d.

del d[key]

Delete entry for key in d.

key in d

True if key is in d.

key not in d

True if key is not in d.

Because dictionaries are mutable, there are operations like del that change the contents of the dictionary.

Dictionary Methods

Dictionaries are objects and therefore also have their own methods:

d.get(key[, default])

Get value for key if key in d; otherwise,

return default (or None).

d.pop(key[, default])

Get and delete value for key if key in d;

otherwise return default

d.keys()

View of keys in d.

d.values()

View of values in d.

d.items()

View of (key, value) pairs in d.

The .get() method is similar to using d[key], except that it returns a default value (or None) instead of raising a KeyError exception. The .keys(), .values(), and .items() methods return special dictionary views that are iterable and dynamic, meaning that they update as the contents of the dictionary changes. The list() type converter will convert a view to a list.

Dictionary Loops

Python conveniently allows dictionaries to be used in for loops:

for <variable> in <dict>: # loop over each key
 <body>

Remember that a dictionary’s key-value pairs are not stored in any particular order, so this type of loop cannot control the order in which the keys are found.

If you need to loop over a dictionary in sorted order, as in Listing 18.1, use the built-in sorted() function:

sorted(x)

Sorted list from iterable x

Exercises

  1. 18.1 Which of the following types may be used as keys in a Python dictionary: integer, float, string, list, boolean, dictionary? Explain your answers.
  2. 18.2 Which of the following types may be used as values in a Python dictionary: integer, float, string, list, boolean, dictionary? Explain your answers.
  3. 18.3 Write one Python statement to create a dictionary named filmratings, where each key is a movie title and the value is your rating of that film from 1 (low) to 5 (high). Include at least four entries.
  4. 18.4 Use Listing 18.1 to answer these questions:
    1. (a) Describe the purpose of line 8. Look up the .translate() and .maketrans() string methods if you have not used them before.
    2. (b) Explain why the if statement is necessary at line 14.
    3. (c) Describe the effect of removing the call to sorted() in line 21.
    4. (d) Give the type of data referred to by words in line 26. Explain how you know.
  5. 18.5 As it is, Listing 18.1 does not have a separate function to remove punctuation, even though that work is being done. Rewrite the program to use a separate removepunc(s) function.
  6. 18.6 Rewrite the frequency() function in Listing 18.1 to remove the if statement by using .get() with a default value. Discuss the tradeoffs.
  7. 18.7 Research the defaultdict() function from the collections module and use it to rewrite the frequency() function in Listing 18.1. Discuss the tradeoffs.
  8. 18.8 Listing 18.1 makes some poor decisions while counting word frequencies. Fix how each of these is handled:
    1. (a) Hyphenated words
    2. (b) Apostrophes
    3. (c) Dashes ("--" in Moby Dick)

    Doing Exercise 18.5 first may be helpful.

  9. 18.9 Write a program letterfreq.py that computes the frequency with which each letter appears in a given file. Handle case appropriately. Letter frequencies are important for encryption, compression, modern keyboard designs, and some games.

    This exercise provides an alternate solution to Exercise 12.9 to count the nucleotide bases in a DNA string.

  10. 18.10 Write a program dict.py that provides an interactive dictionary, where a user can add or change definitions, as well as look up definitions. The program starts with an empty dictionary, and then entries are gradually created by the user. An interactive session might look like this:
     Welcome to PyDict
      [a]dd or change a definition
      [l]ookup a word
      [q]uit
     Your choice: l
     Word to look up: python
     Not in dictionary
      [a]dd or change a definition
      [l]ookup a word
      [q]uit
     Your choice: a
     Word: python
     Definition: A long, large snake.
      [a]dd or change a definition
      [l]ookup a word
      [q]uit
     Your choice: l
     Word to look up: python
     Definition: A long, large snake.
      [a]dd or change a definition
      [l]ookup a word
      [q]uit
     Your choice: q
     Thanks!
  11. 18.11 Modify Exercise 15.12 to use a dictionary in order to find the word(s) with the most anagrams. Does this approach result in a faster program?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.171.137