Chapter 26
While primarily thought of and designed as a scripting language, Python supports several different programming paradigms. This chapter provides an introduction to functional programming by revisiting a problem from Part II.
Listing 26.1: Word Frequency (Functional Version)
1 # wordfreqfp.py
2
3 import string
4 import itertools
5
6 def removepunc(s):
7 return s.translate(s.maketrans("", "", string.punctuation))
8
9 def getwords(fname):
10 with open(fname) as f:
11 return removepunc(f.read().lower()).split()
12
13 def frequency(words):
14 return list(map(lambda xy: (xy[0], len(list(xy[1]))),
15 itertools.groupby(sorted(words))))
16
17 def main():
18 print(frequency(getwords("moby.txt")))
19
20 main()
Compare this code with Listing 18.1: the style should feel different. Like Listing 18.1, this program should be run from the command line. In fact, you may need to replace the print() in main() with this loop if you receive an out-of-memory error:
for word, freq in frequency(getwords("moby.txt")):
print(word, freq)
Functional programs generally do not use for loops, but it may be necessary here in order to get output. The map() function is described in Chapter 24.
Python supports four of the most common programming paradigms:
Imperative Programming Sequences of statements are used to change the program state. State is usually maintained and updated with assignment statements.
Procedural Programming A form of imperative programming in which procedures (functions in Python) are used to organize tasks and subtasks.
Object-Oriented Programming Objects manage state by communicating with each other using method calls.
Functional Programming Emphasizes functions that transform input to output without side effects that modify program state. In particular, assignment statements (which change program state) are avoided.
Most of the code we have written so far has been procedural and therefore imperative. The past few chapters emphasized object-oriented programming, and in this chapter we explore functional programming. As you begin to study Listing 26.1, notice how it consists almost entirely of function calls. There is no dictionary or other data structure that separately stores the word frequencies. Instead, all of the data is passed from one function to the next until the program is finished.
In fact, you have already seen some features of Python that are functional in style:
Python demonstrates that the lines between these paradigms do not have to be strict.
Because the functional programming style emphasizes functional transformations of input to output, we begin with an overview of the transformations that take place in Listing 26.1.
File |
↓ |
List of words |
↓ |
Sorted list of words |
↓ |
Grouped list of (word, [word, word, ...]) pairs |
↓ |
Output list of (word, count) pairs |
Each of these steps is accomplished by calling one or more Python functions. Exercise 26.2 asks you to identify the function(s) used at each step.
Before getting further into the details of Listing 26.1, it will be helpful to understand iterators and iterables a bit better. Recall from Chapter 4 that the range() function returns an iterable, and in Chapter 24 we saw that the map() function returns an iterator. Neither of these returns an explicit list.
Think of an iterator as a “potential list” that will provide its elements, one by one, when asked for them. An iterable is closely related and is any object that can provide an iterator. The distinction between iterators and iterables is not crucial for us; what matters is that many list operations will in fact work with any iterable or iterator. You may have noticed this already in the documentation. And any time you need the entire list from an iterator or iterable, just use the list() type converter. Line 14 of Listing 26.1 uses list() twice in this way.
Once the list of words has been sorted, we want to group together all words that are the same so that the counts can be totaled at the next stage. The itertools module provides the groupby() function for exactly this purpose:
itertools.groupby(items) |
Group items into pairs: |
|
(key, [item, item, ...]). |
The list sent to groupby() should be sorted first so that all identical items are collected together. The return value of groupby() is an iterator that contains pairs:
(key, <iterator of items with the same key>)
where the second element of each pair is itself an iterator of all the items in the original list that match the same key. Thus, in Listing 26.1, because we are sending a list of words to groupby() and the keys are the words themselves, the output at this stage for a word that occurs three times (such as “wheelbarrow”) will be:
("wheelbarrow", <iterator for 3 occurrences of "wheelbarrow">)
An iterator cannot be asked directly for its length, but it can be converted to a list first, and then the length of the list may be computed. That is the approach taken on line 14 of Listing 26.1.
Lambda expressions are useful for quick, one-time function definitions that are not worth writing a separate function for. They create anonymous functions, that is, functions that do not have a name. The syntax is:
lambda <parameters> : <expression>
The body of the lambda form is limited to be a single expression rather than a complete function body.
A lambda definition is equivalent to:
def name(<parameters>):
return <expression>
except that the function has no name. Lambda expressions are especially useful with map(), as in Listing 26.1.
Listing 26.1 requires a lambda form that takes a tuple as a parameter, because itertools.groupby() returns an iterator of tuples. In line 14, the tuple is named xy, and the components of the tuple are then accessed using indices. Again, the second item in each tuple, xy[1], is an iterator, so the lambda expression converts it to a list before computing its length.
Rather than using while or for loops, functional programs often implement repetition through recursion. A function is recursive if it calls itself, usually on a smaller argument. The sequence of calls must eventually terminate in a base case that does not generate a new recursive call.
For example, the factorial function implemented with recursion might look like this:
def factorial(n):
return n * factorial(n − 1) if n > 1 else 1
Here, the smaller argument in the recursive call is n − 1, and the base case is when n ≤ 1. No state is explicitly being kept in a variable, as it would be in an iterative version.
The Sierpinski triangle is a recursive object, and the .draw() method of Listing 25.1 uses a form of recursion as it calls itself on three new triangle objects.
18.116.23.56