5
Higher-Order Functions

A very important feature of the functional programming paradigm is higher-order functions. We’ll look at these three varieties of higher-order functions:

  • Functions that accept functions as one (or more) of their arguments

  • Functions that return a function

  • Functions that accept a function and return a function, a combination of the preceding two features

We’ll look at the built-in higher-order functions in this chapter. Separate from these functions, we’ll look at a few of the library modules that offer higher-order functions in later chapters after introducing the concepts here.

Functions that accept functions and create functions include complex callable classes as well as function decorators. We’ll defer consideration of decorators until Chapter 12, Decorator Design Techniques.

In this chapter, we’ll look at the following functions:

  • max() and min()

  • map()

  • filter()

  • iter()

  • sorted()

Additionally, we’ll look at the itemgetter() function in the operator module. This function is useful for extracting an item from a sequence.

We’ll also look at lambda forms that we can use to simplify using higher-order functions.

The max() and min() functions are reductions; they create a single value from a collection. The other functions are mappings. They don’t reduce the input to a single value.

The max(), min(), and sorted() functions have both a default behavior as well as a higher-order function behavior. If needed, a function can be provided via the key= argument. There is a meaningful default behavior for these functions.

The map() and filter() functions take the function as the first positional argument. Here, the function is required because there is no default behavior.

There are a number of higher-order functions in the itertools module. We’ll look at this module in Chapter 8, The Itertools Module, and Chapter 9, Itertools for Combinatorics – Permutations and Combinations.

Additionally, the functools module provides a general-purpose reduce() function. We’ll look at this in Chapter 10, The Functools Module, because it requires a bit more care to use. We need to avoid transforming an inefficient algorithm into a nightmare of excessive processing.

5.1 Using max() and min() to find extrema

The max() and min() functions each have a dual life. They are simple functions that apply to collections. They are also higher-order functions. We can see their default behavior as follows:

>>> max(1, 2, 3) 
3 
>>> max((1,2,3,4)) 
4

Both functions will accept an indefinite number of arguments. The functions are designed to also accept a sequence or an iterable as the only argument and locate the max (or min) of that iterable. When applied to a mapping collection, they will locate the maximum (or minimum) key value.

They also do something more sophisticated. Let’s say we have our trip data from the examples in Chapter 4, Working with Collections. We have a function that will generate a sequence of tuples that looks as follows:

[ 
 ((37.54901619777347, -76.33029518659048), (37.840832, -76.273834), 17.7246), 
 ((37.840832, -76.273834), (38.331501, -76.459503), 30.7382), 
 ((38.331501, -76.459503), (38.845501, -76.537331), 31.0756), 
 ((36.843334, -76.298668), (37.549, -76.331169), 42.3962), 
 ((37.549, -76.331169), (38.330166, -76.458504), 47.2866), 
 ((38.330166, -76.458504), (38.976334, -76.473503), 38.8019) 
]

Each tuple in this collection has three values: a starting location, an ending location, and a distance. The locations are given in latitude and longitude pairs. The east latitude is positive; these are points along the US East Coast, about 76° west. The distances between points are in nautical miles.

We have three ways of getting the maximum and minimum distances from this sequence of values. They are as follows:

  • Extract the distance with a generator function. This will give us only the distances, as we’ve discarded the other two attributes of each leg. This won’t work out well if we have any additional processing requirements based on the latitude or longitude.

  • Use the unwrap(process(wrap())) pattern. This will give us the legs with the longest and shortest distances. From these, we can extract the distance or the point as needed.

  • Use the max() and min() functions as higher-order functions, inserting a function that does the extraction of the important distance values. This will also preserve the original objects with all of their attributes.

To provide context, the following script builds the overall trip:

>>> from Chapter04.ch04_ex1 import ( 
...    floats_from_pair, float_lat_lon, row_iter_kml, haversine, legs 
... ) 
>>> import urllib.request 
>>> data = "file:./Winter%202012-2013.kml" 
 
>>> with urllib.request.urlopen(data) as source: 
...     path = floats_from_pair(float_lat_lon(row_iter_kml(source))) 
...     trip = list( 
...         (start, end, round(haversine(start, end), 4)) 
...         for start, end in legs(path) 
...     )

The resulting trip object is a list object, containing the individual legs. Each leg is a three-tuple with the starting point, the ending point, and the distance, computed with the haversine() function. The leg() function creates start-end pairs from the overall path of points in the original KML file. The list() function consumes values from the lazy generator to materialize the list of legs.

Once we have the trip object, we can extract distances and compute the maximum and minimum of those distances. The code to do this with a generator function looks as follows:

>>> longest = max(dist for start, end, dist in trip) 
>>> shortest = min(dist for start, end, dist in trip)

We’ve used a generator function to extract the relevant item from each leg of the trip tuple. We’ve had to repeat the generator expression because the expression dist for start, end, dist in trip can be consumed only once.

Here are the results based on a larger set of data than was shown previously:

>>> longest 
129.7748 
>>> shortest 
0.1731

It may help to refer to Chapter 2, Introducing Essential Functional Concepts, for examples of the wrap-process-unwrap design pattern.

The following is a version of the unwrap(process(wrap())) pattern applied to this data:

from collections.abc import Iterator, Iterable 
from typing import Any 
 
def wrap(leg_iter: Iterable[Any]) -> Iterable[tuple[Any, Any]]: 
    return ((leg[2], leg) for leg in leg_iter) 
 
def unwrap(dist_leg: tuple[Any, Any]) -> Any: 
    distance, leg = dist_leg 
    return leg

We can use these functions as follows:

>>> longest = unwrap(max(wrap(trip))) 
>>> longest 
((27.154167, -80.195663), (29.195168, -81.002998), 129.7748) 
 
>>> short = unwrap(min(wrap(trip))) 
>>> short 
((35.505665, -76.653664), (35.508335, -76.654999), 0.1731)

The final and most important form uses the higher-order function feature of the max() and min() functions. We’ll define a helper function first and then use it to reduce the collection of legs to the desired summaries by executing the following code snippet:

def by_dist(leg: tuple[Any, Any, Any]) -> Any: 
    lat, lon, dist = leg 
    return dist

We can use this function as the key= argument value to the built-in max() function. It looks like the following:

>>> longest = max(trip, key=by_dist) 
>>> longest 
((27.154167, -80.195663), (29.195168, -81.002998), 129.7748) 
 
>>> short = min(trip, key=by_dist) 
>>> short 
((35.505665, -76.653664), (35.508335, -76.654999), 0.1731)

The by_dist() function picks apart the three items in each leg tuple and returns the distance item. We’ll use this with the max() and min() functions.

The max() and min() functions both accept an iteratable and a function as arguments. The keyword parameter key= is used by many of Python’s higher-order functions to provide a function that will be used to extract the necessary key value.

5.1.1 Using Python lambda forms

In many cases, the definition of a helper function seems to require too much code. Often, we can digest the key= function to a single expression. It can seem wasteful to have to write both def and return statements to wrap a single expression.

Python offers the lambda form as a way to simplify using higher-order functions. A lambda form allows us to define a small, anonymous function. The function’s body is limited to a single expression.

The following is an example of using a simple lambda expression as the key= function:

>>> longest = max(trip, key=lambda leg: leg[2]) 
>>> shortest = min(trip, key=lambda leg: leg[2])

The lambda we’ve used will be given an item from the sequence; in this case, each leg three-tuple will be given to the lambda. The lambda argument variable, leg, is assigned and the expression, leg[2], is evaluated, plucking the distance from the three-tuple.

In cases where a lambda is used exactly once, this form is ideal. When reusing a lambda, it’s important to avoid copy and paste. In the above example, the lambda is repeated, a potential software maintenance nightmare. What’s the alternative?

We can assign lambdas to variables, by doing something like this:

start = lambda x: x[0] 
end = lambda x: x[1] 
dist = lambda x: x[2]

Each of these lambda forms is a callable object, similar to a defined function. They can be used like a function.

The following is an example at the interactive prompt:

>>> longest = ((27.154167, -80.195663), (29.195168, -81.002998), 129.7748) 
>>> dist(longest) 
129.7748

Here are two reasons for avoiding this technique:

  • PEP 8, the style guide for Python code, advises against assigning lambda objects to variables. See https://peps.python.org/pep-0008/ for more information.

  • The operator module provides a generic item getter, itemgetter(). This is a higher-order function that returns a function we can use instead of a lambda object.

To extend this example, we’ll look at how we get the latitude or longitude value of the starting or ending point.

The following is a continuation of the interactive session:

>>> from operator import itemgetter 
>>> start = itemgetter(0) 
>>> start(longest) 
(27.154167, -80.195663) 
 
>>> lat = itemgetter(0) 
>>> lon = itemgetter(1) 
>>> lat(start(longest)) 
27.154167

We’ve imported the itemgetter() function from the operator module. The value returned by this function is a function that will grab the requested item from a sequence. In the first part of the example, the start() function will extract item 0 from a sequence.

The lat() and lon() functions, similarly, are created by the itemgetter() function. Note that the complexity of the nested tuples in the data structure must be carefully paralleled with the itemgetter() functions.

There’s no clear advantage to using lambda objects or itemgetter() functions as a way to extract fields over defining a typing.NamedTuple class or a dataclass. Using lambdas (or better, the itemgetter() function) does allow the code to rely on prefix function notation, which might be easier to read in a functional programming context. We can gain a similar advantage by using the operator.attrgetter function to extract a specific attribute from a typing.NamedTuple class or dataclass. Using attrgetter duplicates a name. For example, a typing.NamedTuple class with an attribute of lat may also use attrgetter(’lat’); this can make it slightly harder to locate all references to an attribute when refactoring.

5.1.2 Lambdas and the lambda calculus

If Python were a purely functional programming language, it would be necessary to explain Church’s lambda calculus, and the technique invented by Haskell Curry that we call currying. Python, however, doesn’t stick closely to the lambda calculus. Functions are not curried to reduce them to single-argument lambda forms.

Python lambda forms are not restricted to single-argument functions. They can have any number of arguments. They are restricted to a single expression, however.

We can, using the functools.partial function, implement currying. We’ll save this for Chapter 10, The Functools Module.

5.2 Using the map() function to apply a function to a collection

A scalar function maps values from a domain to a range. When we look at the math.sqrt() function, as an example, we’re looking at a mapping from a float value, x, to another float value, y = sqrt(x), such that y2 = x. The domain is limited to non-negative values for the math module. When using the cmath module, any number can be used, and the results can be complex numbers.

The map() function expresses a similar concept; it maps values from one collection to create another collection. It assures that the given function is used to map each individual item from the domain collection to the range collection—this is the ideal way to apply a built-in function to a collection of data.

Our first example involves parsing a block of text to get a sequence of numbers. Let’s say we have the following chunk of text:

>>> text= """ 
... 2 3 5 7 11 13 17 19 23 29 
... 31 37 41 43 47 53 59 61 67 71 
... 73 79 83 89 97 101 103 107 109 113 
... 127 131 137 139 149 151 157 163 167 173 
... 179 181 191 193 197 199 211 223 227 229 
... """

We can restructure this text using the following generator function:

>>> data = list( 
...     v 
...     for line in text.splitlines() 
...         for v in line.split() 
... )

This will split the text into lines. For each line, it will split the line into space-delimited words and iterate through each of the resulting strings. The results look as follows:

[’2’, ’3’, ’5’, ’7’, ’11’, ’13’, ’17’, ’19’, ’23’, ’29’, 
’31’, ’37’, ’41’, ’43’, ’47’, ’53’, ’59’, ’61’, ’67’, ’71’, 
’73’, ’79’, ’83’, ’89’, ’97’, ’101’, ’103’, ’107’, ’109’, ’113’, 
’127’, ’131’, ’137’, ’139’, ’149’, ’151’, ’157’, ’163’, ’167’, 
’173’, ’179’, ’181’, ’191’, ’193’, ’197’, ’199’, ’211’, ’223’, 
’227’, ’229’]

We still need to apply the int() function to each of the string values. This is where the map() function excels. Take a look at the following code snippet:

>>> list(map(int, data)) 
[2, 3, 5, 7, 11, 13, 17, 19, ..., 229]

The map() function applied the int() function to each value in the collection. The result is a sequence of numbers instead of a sequence of strings.

The map() function’s results are iterable. The map() function can process any type of iterable.

The idea here is that any Python function can be applied to the items of a collection using the map() function. There are a lot of built-in functions that can be used in this map-processing context.

5.2.1 Working with lambda forms and map()

Let’s say we want to convert our trip distances from nautical miles to statute miles. We want to multiply each leg’s distance by 6076.12/5280, which is 1.150780.

We’ll rely on a number of itemgetter functions to extract data from the data structure. We can combine extractions with computation of new values. We can do this calculation with the map() function as follows:

>>> from operator import itemgetter 
>>> start = itemgetter(0) 
>>> end = itemgetter(1) 
>>> dist = itemgetter(2) 
>>> sm_trip = map( 
...     lambda x: (start(x), end(x), dist(x) * 6076.12 / 5280), 
...     trip 
... )

We’ve defined a lambda that will be applied to each leg in the trip by the map() function. The lambda will use the itemgetter function to separate the start, end, and distance values from each leg’s tuple. It will compute a revised distance and assemble a new leg tuple from the start, end, and statute mile distances.

This is precisely like the following generator expression:

>>> sm_trip = ( 
...     (start(x), end(x), dist(x) * 6076.12 / 5280) 
...     for x in trip 
... )

We’ve done the same processing on each item in the generator expression.

Using the built-in map() function or a generator expression will produce identical results and have nearly identical performance. The choice of using lambdas, named tuples, defined functions, the operator.itemgetter() function, or generator expressions is entirely a matter of how to make the resulting application program succinct and expressive.

5.2.2 Using map() with multiple sequences

Sometimes, we’ll have two collections of data that need to be parallel to each other. In Chapter 4, Working with Collections, we saw how the zip() function can interleave two sequences to create a sequence of pairs. In many cases, we’re really trying to do something like the following:

map(function, zip(one_iterable, another_iterable))

We’re creating argument tuples from two (or more) parallel iterables and applying a function to the argument tuple. This can be awkward because the parameters to the given function, function(), will be a single two-tuple; the argument values will not be applied to each parameter.

As a consequence, we can think about using the following technique to decompose the tuple into two individual parameters:

( 
    function(x, y) 
    for x, y in zip(one_iterable, another_iterable) 
)

Here, we’ve replaced the map() function with an equivalent generator expression. for x, y decomposes the two-tuples so we can apply them to each parameter of the function.

There is a better approach that is already available to us. Let’s look at a concrete example of the alternate approach.

In Chapter 4, Working with Collections, we looked at trip data that we extracted from an XML file as a series of waypoints. We needed to create legs from this list of waypoints that show the start and end of each leg.

The following is a simplified version that uses the zip() function applied to two slices of a sequence:


>>> waypoints = range(4)
>>> zip(waypoints, waypoints[1:])
<zip object at ...>

>>> list(zip(waypoints, waypoints[1:]))
[(0, 1), (1, 2), (2, 3)]

We’ve created a sequence of pairs drawn from a single flat list. Each pair will have two adjacent values. The zip() function stops when the shorter list is exhausted. This zip(x, x[1:]) pattern only works for materialized sequences and the iterable created by the range() function. It won’t work for iterable objects because the slicing operation isn’t implemented.

We created pairs so that we can apply the haversine() function to each pair to compute the distance between the two points on the path. The following is how it looks in one sequence of steps:

>>> from Chapter04.ch04_ex1 import ( 
...    floats_from_pair, float_lat_lon, row_iter_kml, haversine 
... ) 
>>> import urllib.request 
 
>>> data = "file:./Winter%202012-2013.kml" 
>>> with urllib.request.urlopen(data) as source: 
...     path_gen = floats_from_pair( 
...         float_lat_lon(row_iter_kml(source))) 
...     path = list(path_gen) 
 
>>> distances_1 = map( 
...     lambda s_e: (s_e[0], s_e[1], haversine(*s_e)), 
...     zip(path, path[1:]) 
... )

We’ve built a list of waypoints, and labeled this with the path variable. This is an ordered sequence of latitude-longitude pairs. As we’re going to use the zip(path, path[1:]) design pattern, we must have a materialized sequence and not an iterable.

The results of the zip() function will be pairs that have a start and end. We want our output to be a triple with the start, end, and distance. The lambda we’re using will decompose the original start-end two-tuple and create a new three-tuple from the start, end, and distance.

We can simplify this by using a clever feature of the map() function, which is as follows:

>>> distances_2 = map( 
...     lambda s, e: (s, e, haversine(s, e)), 
...     path, path[1:] 
... )

Note that we’ve provided a lambda object and two iterables to the map() function. The map() function will take the next item from each iterable and apply those two values as the arguments to the given function. In this case, the given function is a lambda that creates the desired three-tuple from the start, end, and distance.

The formal definition for the map() function states that it will do star-map processing with an indefinite number of iterables. It will take items from each iterable to create a tuple of argument values for the given function. This saves us from having to add the zip function to combine sequences.

5.3 Using the filter() function to pass or reject data

The job of the filter() function is to use and apply a decision function called a predicate to each value in a collection. When the predicate function’s result is true, the value is passed; otherwise, the value is rejected. The itertools module includes filterfalse() as a variation on this theme. Refer to Chapter 8, The Itertools Module, to understand the usage of the itertools module’s filterfalse() function.

We might apply this to our trip data to create a subset of legs that are over 50 nautical miles long, as follows:

>>> long_legs = list( 
...     filter(lambda leg: dist(leg) >= 50, trip) 
... )

The predicate lambda will be True for long legs, which will be passed. Short legs will be rejected. The output contains the 14 legs that pass this distance test.

This kind of processing clearly segregates the filter rule (lambda leg: dist(leg) >= 50) from any other processing that creates the trip object or analyzes the long legs.

For another simple example, look at the following code snippet:

>>> filter(lambda x: x % 3 == 0 or x % 5 == 0, range(10)) 
<filter object at ...> 
>>> sum(_) 
23

We’ve defined a small lambda to check whether a number is a multiple of three or a multiple of five. We’ve applied that function to an iterable, range(10). The result is an iterable sequence of numbers that are passed by the decision rule.

The numbers for which the lambda is True are [0, 3, 5, 6, 9], so these values are passed. As the lambda is False for all other numbers, they are rejected.

The _ variable is a special feature of Python’s REPL. It is implicitly set to the result of an expression. In the previous example, the filter(...) result was assigned to _. On the next line, sum(_) consumed the values from the filter(...) result.

This is only available in the REPL, and exists to save us a little bit of typing when we’re exploring complex functions interactively.

This can also be done with a generator expression by executing the following code:

>>> list(x for x in range(10) if x % 3 == 0 or x % 5 == 0) 
[0, 3, 5, 6, 9]

We can formalize this using the following set comprehension notation:

{x | 0 ≤ x < 10 ∧ (x ≡ 0 mod 3 ∨ x ≡ 0 mod 5)}

This says that we’re building a collection of x values such that x is in range(10) and x % 3 == 0 or x % 5 == 0. There’s a very elegant symmetry between the filter() function and formal mathematical set comprehensions.

We often want to use the filter() function with defined functions instead of lambda forms. The following is an example of reusing a predicate defined earlier:

>>> from Chapter02.ch02_ex1 import isprimeg 
 
>>> list(filter(isprimeg, range(100))) 
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

In this example, we imported a function from another module called isprimeg(). We then applied this function to a collection of values to pass the prime numbers and reject any non-prime numbers from the collection.

This can be a remarkably inefficient way to generate a table of prime numbers. The superficial simplicity of this is the kind of thing lawyers call an attractive nuisance. It looks like it might be fun, but it doesn’t scale well at all. The isprimeg() function duplicates all of the testing effort for each new value. Some kind of cache is essential to provide redoing the testing of primality. A better algorithm is the Sieve of Eratosthenes; this algorithm retains the previously located prime numbers and uses them to prevent recalculation.

For more information on primality testing, and this algorithm for finding small prime numbers, see https://primes.utm.edu/prove/prove2_1.html.

5.3.1 Using filter() to identify outliers

In the previous chapter, we defined some useful statistical functions to compute mean and standard deviation and normalize a value. We can use these functions to locate outliers in our trip data. What we can do is apply the mean() and stdev() functions to the distance value in each leg of a trip to get the population mean and standard deviation.

We can then use the z() function to compute a normalized value for each leg. If the normalized value is more than 3, the data is potentially far from the mean. If we reject these outliers, we have a more uniform set of data that’s less likely to harbor reporting or measurement errors.

The following is how we can tackle this:

>>> from Chapter04.ch04_ex3 import mean, stdev, z 
 
>>> dist_data = list(map(dist, trip)) 
>>> μ_d = mean(dist_data) 
>>> σ_d = stdev(dist_data) 
 
>>> outlier = lambda leg: abs(z(dist(leg), μ_d, σ_d)) > 3 
 
>>> list(filter(outlier, trip))

We’ve mapped the distance function to each leg in the trip collection. The dist() function is the function created by itemgetter(2). As we’ll do several things with the result, we must materialize a list object. We can’t rely on the iterator, as the first function in this sequence of steps will consume all of the iterator’s values. We can then use this extraction to compute population statistics μ_d and σ_d with the mean and standard deviation.

Given the mean and standard deviation values, we used the outlier lambda to filter our data. If the normalized value is too large, the data is an outlier. The threshold for ”too far from the mean” can vary based on the kind of distribution. For a normal distribution, the probability of a value being within three standard deviations from the mean is 0.997.

The result of list(filter(outlier, trip)) is a list of two legs that are quite long compared to the rest of the legs in the population. The average distance is about 34 nm, with a standard deviation of 24 nm.

We’re able to decompose a fairly complex problem into a number of independent functions, each of which can be easily tested in isolation. Our processing is a composition of simpler functions. This can lead to succinct, expressive functional programming.

5.4 The iter() function with a sentinel value

The built-in iter() function creates an iterator over an object of a collection class. The list, dict, and set classes all work with the iter() function to provide an iterator object for the items in the underlying collection. In most cases, we’ll allow the for statement to do this implicitly. In a few cases, however, we need to create an iterator explicitly. One example of this is to separate the head from the tail of a collection.

Other uses of the iter() function include building iterators to consume the values created by a callable object (for example, a function) until a sentinel value is found. This feature is sometimes used with the read() method of a file to consume items until some end-of-line or end-of-file sentinel value is found. An expression such as iter(file.read, ’ ’) will evaluate the given function until the sentinel value, ’ ’, is found. This must be used carefully: if the sentinel is not found, it can continue reading zero-length strings forever.

Providing a callable function to iter() can be a bit challenging because the function we provide must maintain some state internally. This is generally looked at as undesirable in functional programs.

However, hidden state is a feature of an open file: each read() or readline() method of a file advances the internal state to the next character or the next line.

Another example of explicit iteration is the way that a mutable collection object’s pop() method makes a stateful change to a collection object. The following is an example of using the pop() method:

>>> source = [1, 2, 3, None, 4, 5, 6] 
>>> tail = iter(source.pop, None) 
>>> list(tail) 
[6, 5, 4]

The tail variable was set to an iterator over the list [1, 2, 3, None, 4, 5, 6] that will be traversed by the pop() function. The default behavior of pop() is pop(-1); that is, the elements are popped in the reverse order. This makes a stateful change to the list object: each time pop() is called, the item is removed, mutating the list. When the sentinel value is found, the iterator stops returning values. If the sentinel is not found, this will break with an IndexError exception.

This kind of internal state management is something we’d like to avoid. Consequently, we won’t try to contrive a use for this feature.

5.5 Using sorted() to put data in order

When we need to produce results in a defined order, Python gives us two choices. We can create a list object and use the list.sort() method to put items in an order. An alternative is to use the sorted() function. This function works with any iterable, but it creates a final list object as part of the sorting operation.

The sorted() function can be used in two ways. It can be simply applied to collections. It can also be used as a higher-order function using the key= argument.

Let’s say we have our trip data from the examples in Chapter 4, Working with Collections. We have a function that will generate a sequence of tuples with the starting location, end location, and distance for each leg of a trip. The data looks as follows:

[ 
 ((37.54901619777347, -76.33029518659048), (37.840832, -76.273834), 17.7246), 
 ((37.840832, -76.273834), (38.331501, -76.459503), 30.7382), 
 ((38.331501, -76.459503), (38.845501, -76.537331), 31.0756), 
 ((36.843334, -76.298668), (37.549, -76.331169), 42.3962), 
 ((37.549, -76.331169), (38.330166, -76.458504), 47.2866), 
 ((38.330166, -76.458504), (38.976334, -76.473503), 38.8019) 
]

We can see the default behavior of the sorted() function using the following interaction:

>>> sorted(dist(x) for x in trip) 
[0.1731, 0.1898, 1.4235, 4.3155, ... 86.2095, 115.1751, 129.7748]

We used a generator expression (dist(x) for x in trip) to extract the distances from our trip data. The dist() function is the function created by itemgetter(2). We then sorted this iterable collection of numbers to get the distances from 0.17 nm to 129.77 nm.

If we want to keep the legs and distances together in their original three-tuples, we can have the sorted() function apply a key= function to determine how to sort the tuples, as shown in the following code snippet:

>>> sorted(trip, key=dist) 
[((35.505665, -76.653664), (35.508335, -76.654999), 0.1731), ...

We’ve sorted the trip data, using the dist() function to extract the distance from each tuple. The dist() function, shown earlier, is created by the itemgetter() function as follows:

>>> from operator import itemgetter 
>>> dist = itemgetter(2)

As an alternative, we can also use a lambda leg: leg[2] to select a specific value from a tuple. Providing a name, dist, makes it a little more clear which item is being selected from the tuple.

5.6 Overview of writing higher-order functions

We’ll look at designing our own higher-order functions. We’ll summarize some of the process before diving into some more complex kinds of design patterns. We’ll start by looking at common data transformations, such as the following:

  • Wrap objects to create more complex objects

  • Unwrap complex objects into their components

  • Flatten a structure

  • Structure a flat sequence

These patterns can help to visualize ways higher-order functions can be designed in Python.

It can also help to recall that a Callable class definition is a function that returns a callable object. We’ll look at this as a way to write flexible functions into which configuration parameters can be injected.

We’ll defer deeper consideration of decorators until Chapter 12, Decorator Design Techniques. A decorator is also a higher-order function, but it consumes one function and returns another, making it more complex than the examples in this chapter. We’ll start with developing highly-customized versions of map() and filter().

5.7 Writing higher-order mappings and filters

Python’s two built-in higher-order functions, map() and filter(), generally handle almost everything we might want to throw at them. It’s difficult to optimize them in a general way to achieve higher performance. We’ll look at similar functions such as imap() in Chapter 14, The Multiprocessing, Threading, and Concurrent.Futures Modules.

We have three largely equivalent ways to express a mapping. Assume that we have some function, f(x), and some collection of objects, C. The ways we can compute a mapping from the domain value in C to a range value are as follows:

  • The map() function:

    map(f, C)
  • A generator expression:

    (f(x) for x in C)
  • A generator function with a yield statement:

    from collections.abc import Callable, Iterable, Iterator 
    from typing import Any 
     
    def mymap(f: Callable[[Any], Any], C: Iterable[Any]) -> Iterator[Any]: 
        for x in C: 
            yield f(x)

    This mymap() function can be used as an expression with the function to apply and the iterable source of data:

    mymap(f, C)

Similarly, we have three ways to apply a filter function to a collection, all of which are equivalent:

  • The filter() function:

    filter(f, C)
  • A generator expression:

    (x for x in C if f(x))
  • A generator function with a yield statement:

    from collections.abc import Callable, Iterable, Iterator 
    from typing import Any 
     
    def myfilter(f: Callable[[Any], bool], C: Iterable[Any]) -> Iterator[Any]: 
        for x in C: 
            if f(x): 
                yield x

    This myfilter() function can be used as an expression with the function to apply and the iterable source of data:

    myfilter(f, C)

There are some minor performance differences; often the map() and filter() functions are fastest. More importantly, there are different kinds of extensions that fit these mapping and filtering designs, which are as follows:

  • If we need to modify the processing, we can create a more sophisticated function, g(x), that is applied to each element. This is the most general approach and applies to all three designs. This is where the bulk of our functional design energy is invested. We may define our new function around the existing f(x), or we may find that we need to refactor the original function. In all cases, this design effort seems to yield the most benefits.

  • We can tweak the for loop inside the generator expression or generator function. One obvious tweak is to combine mapping and filtering into a single operation by extending the generator expression with an if clause. We can also merge the mymap() and myfilter() functions to combine mapping and filtering. This requires some care to be sure the resulting function is not a clutter of features.

Profound changes that alter the structure of the data handled by the loop often happen as software evolves and matures. We have a number of design patterns, including wrapping, unwrapping (or extracting), flattening, and structuring. We’ve looked at a few of these techniques in previous chapters.

In the following sections, we’ll look at ways to design our own higher-order functions. We’ll start with unwrapping complex data while also applying a mapping function. For each example, it’s important to look at where the complexity arises, and decide if the resulting code really is succinct and expressive.

5.7.1 Unwrapping data while mapping

When we use a construct such as (f(x) for x, y in C), we use the multiple assignment feature of the for statement to unwrap a multi-valued tuple and then apply a function. The whole expression is a mapping. This is a common Python optimization to change the structure and apply a function.

We’ll use our trip data from Chapter 4, Working with Collections. The following is a concrete example of unwrapping while mapping:

from collections.abc import Callable, Iterable, Iterator 
from typing import Any, TypeAlias 
 
Conv_F: TypeAlias = Callable[[float], float] 
Leg: TypeAlias = tuple[Any, Any, float] 
 
def convert( 
        conversion: Conv_F, 
        trip: Iterable[Leg]) -> Iterator[float]: 
    return ( 
        conversion(distance) 
        for start, end, distance in trip 
    )

This higher-order function would be supported by conversion functions that we can apply to our raw data, as follows:

from collections.abc import Callable 
from typing import TypeAlias 
 
Conversion: TypeAlias = Callable[[float], float] 
 
to_miles: Conversion = lambda nm: nm * 6076.12 / 5280 
 
to_km: Conversion = lambda nm: nm * 1.852 
 
to_nm: Conversion = lambda nm: nm

These have been defined as lambdas and assigned to variables. Some static analysis tools will object to this because PEP-8 frowns on it.

The following shows how we can extract distance and apply a conversion function:

>>> convert(to_miles, trip) 
<generator object ...> 
>>> miles = list(convert(to_miles, trip)) 
>>> trip[0] 
((37.54901619777347, -76.33029518659048), (37.840832, -76.273834), 17.7246) 
>>> miles[0] 
20.397120559090908 
>>> trip[-1] 
((38.330166, -76.458504), (38.976334, -76.473503), 38.8019) 
>>> miles[-1] 
44.652462240151515

As we’re unwrapping, the result will be a sequence of floating-point values. The results are as follows:

[20.397120559090908, 35.37291511060606, ..., 44.652462240151515]

This convert() function is highly specific to our start-end-distance trip data structure, as the for statement decomposes a specific three-tuple.

We can build a more general solution for this kind of unwrapping-while-mapping design pattern. It suffers from being a bit more complex. First, we need general-purpose decomposition functions, as in the following code snippet:

from collections.abc import Callable 
from operator import itemgetter 
from typing import TypeAlias 
 
Selector: TypeAlias = Callable[[tuple[Any, ...]], Any] 
 
fst: Selector = itemgetter(0) 
 
snd: Selector = itemgetter(1) 
 
sel2: Selector = itemgetter(2)

We’d like to be able to express f(sel2(s_e_d)) for s_e_d in trip. This involves functional composition; we’re combining a function, such as to_miles(), and a selector, such as sel2().

More descriptive names are often more useful than generic names. We’ll leave the renaming as an exercise for the reader. We can express functional composition in Python using yet another lambda, as follows:

from collections.abc import Callable 
 
to_miles_sel2: Callable[[tuple[Any, Any, float]], float] = ( 
    lambda s_e_d: to_miles(sel2(s_e_d)) 
)

This gives us a longer but more specialized version of unwrapping and mapping, as follows:

>>> miles2 = list( 
...     to_miles_sel2(s_e_d) for s_e_d in trip 
... )

We can compare the higher-order convert() function against this generator expression. Both apply a number of transformations. The convert() function ”conceals” a processing detail—the composition of a tuple as start, end, and distance—with a for statement that decomposes the tuple. This expression exposes this decomposition by including the sel2() function as part of the definition of a composite function.

Neither is ”better” by any measure. They represent two approaches to exposing or concealing details. In a specific application development context, the exposure (or concealment) might be more desirable.

The same design principle works to create hybrid filters as well as mappings. We’d apply the filter in an if clause of the generator expression that was returned.

We can combine mapping and filtering to create yet more complex functions. While it is appealing to create more complex functions, it isn’t always valuable. A complex function might not beat the performance of a nested use of the map() and filter() functions. Generally, we only want to create a more complex function if it encapsulates a concept and makes the software easier to understand.

5.7.2 Wrapping additional data while mapping

When we use a construct such as ((f(x), x) for x in C), we’ve used wrapping to create a multi-valued tuple while also applying a transformational mapping. This is a common technique to save derived results by creating larger constructs. This has the benefit of avoiding recalculation without the liability of complex objects with an internal state change. In this case, the state change is structural and very visible.

This is part of the example shown in Chapter 4, Working with Collections, to create the trip data from the path of points. The code looks like this:

>>> from Chapter04.ch04_ex1 import ( 
...    floats_from_pair, float_lat_lon, row_iter_kml, haversine, legs 
... ) 
>>> import urllib.request 
>>> data = "file:./Winter%202012-2013.kml" 
 
>>> with urllib.request.urlopen(data) as source: 
...     path = floats_from_pair(float_lat_lon(row_iter_kml(source))) 
...     trip = tuple( 
...         (start, end, round(haversine(start, end), 4)) 
...         for start, end in legs(path) 
...     )

We can revise this slightly to create a higher-order function that separates the wrapping from the other functions. We can refactor this design to create a function that constructs a new tuple including the original tuple and the distance. This function can be defined as follows:

from collections.abc import Callable, Iterable, Iterator 
from typing import TypeAlias 
 
Point: TypeAlias = tuple[float, float] 
Leg_Raw: TypeAlias = tuple[Point, Point] 
Point_Func: TypeAlias = Callable[[Point, Point], float] 
Leg_D: TypeAlias = tuple[Point, Point, float] 
 
def cons_distance( 
        distance: Point_Func, 
        legs_iter: Iterable[Leg_Raw]) -> Iterator[Leg_D]: 
    return ( 
        (start, end, round(distance(start,end), 4)) 
        for start, end in legs_iter 
    )

This function will decompose each leg into two variables, start and end. These variables will be Point instances, defined as tuples of two float values. These will be used with the given distance() function to compute the distance between the points. The function is a callable that accepts two Point objects and returns a float result. The result will build a three-tuple that includes the original two Point objects and also the calculated float result.

We can then rewrite our trip assignment to apply the haversine() function to compute distances, as follows:

>>> source_url = "file:./Winter%202012-2013.kml" 
>>> with urllib.request.urlopen(source_url) as source: 
...    path = floats_from_pair( 
...        float_lat_lon(row_iter_kml(source)) 
...    ) 
...    trip2 = tuple( 
...        cons_distance(haversine, legs(iter(path))) 
...    )

We’ve replaced a generator expression with a higher-order function, cons_distance(). The function not only accepts a function as an argument, but it also returns a generator expression. In some applications, this larger and more complex processing step is a helpful way to elide unecessary details.

In Chapter 10, The Functools Module, we’ll show how to use the partial() function to set a value for the R parameter of the haversine() function, which changes the units in which the distance is calculated.

5.7.3 Flattening data while mapping

In Chapter 4, Working with Collections, we looked at algorithms that flattened a nested tuple-of-tuples structure into a single iterable. Our goal at the time was simply to restructure some data without doing any real processing. We can create hybrid solutions that combine a function with a flattening operation.

Let’s assume that we have a block of text that we want to convert to a flat sequence of numbers. The text looks as follows:

>>> text = """2 3 5 7 11 13 17 19 23 29 
... 31 37 41 43 47 53 59 61 67 71 
... 73 79 83 89 97 101 103 107 109 113 
... 127 131 137 139 149 151 157 163 167 173 
... 179 181 191 193 197 199 211 223 227 229 
... """

Each line is a block of 10 numbers. We need to unblock the rows to create a flat sequence of numbers.

This is done with a two-part generator function, as follows:

>>> data = list( 
...     v 
...     for line in text.splitlines() 
...         for v in line.split() 
... )

This will split the text into lines and iterate through each line. It will split each line into words and iterate through each word. The output from this is a list of strings, as follows:

[’2’, ’3’, ’5’, ’7’, ’11’, ’13’, ’17’, ’19’, ’23’, ’29’, ’31’, ’37’, 
’41’, ’43’, ’47’, ’53’, ’59’, ’61’, ’67’, ’71’, ’73’, ’79’, ’83’, 
’89’, ’97’, ’101’, ’103’, ’107’, ’109’, ’113’, ’127’, ’131’, ’137’, 
’139’, ’149’, ’151’, ’157’, ’163’, ’167’, ’173’, ’179’, ’181’, ’191’, 
’193’, ’197’, ’199’, ’211’, ’223’, ’227’, ’229’]

There’s an optimization to this, which applies to this specific text. We’ll leave that as an exercise for the reader.

To convert the strings to numbers, we must apply a conversion function as well as unwind the blocked structure from its original format, using the following code snippet:

from collections.abc import Callable, Iterator 
from typing import TypeAlias 
 
Num_Conv: TypeAlias = Callable[[str], float] 
 
def numbers_from_rows( 
        conversion: Num_Conv, 
        text: str) -> Iterator[float]: 
    return ( 
        conversion(value) 
        for line in text.splitlines() 
        for value in line.split() 
    )

This function has a conversion argument, which is a function that is applied to each value that will be emitted. The values are created by flattening using the algorithm shown previously.

We can use this numbers_from_rows() function in the following kind of expression:

>>> list(numbers_from_rows(float, text))

Here we’ve used the built-in float() to create a list of floating-point values from the block of text.

We have many alternatives using mixtures of higher-order functions and generator expressions. For example, we might express this as follows:

>>> text = (value 
...     for line in text.splitlines() 
...        for value in line.split() 
... ) 
>>> numbers = map(float, text) 
>>> list(numbers)

This might help us understand the overall structure of the algorithm. The principle is called chunking: we summarize the details of a function with a meaningful name. With this summary, the details are abstracted and we can work with the function as a small concept in a larger context. While we often use higher-order functions, there are times when a generator expression can be clearer.

5.7.4 Structuring data while filtering

The previous three examples combined additional processing with mapping. Combining processing with filtering doesn’t seem to be quite as expressive as combining it with mapping. We’ll look at an example in detail to show that, although it is useful, it doesn’t seem to have as compelling a use case as combining mapping and processing.

In Chapter 4, Working with Collections, we looked at structuring algorithms. We can easily combine a filter with the structuring algorithm into a single, complex function. The following is a version of our preferred function to group the output from an iterable:

from collections.abc import Iterator 
from typing import TypeVar 
 
ItemT = TypeVar("ItemT") 
 
def group_by_iter( 
        n: int, 
        iterable: Iterator[ItemT] 
) -> Iterator[tuple[ItemT, ...]]: 
    def group(n: int, iterable: Iterator[ItemT]) -> Iterator[ItemT]: 
        for i in range(n): 
            try: 
                yield next(iterable) 
            except StopIteration: 
                return 
 
    while row := tuple(group(n, iterable)): 
        yield row

This will try to assemble a tuple of n items taken from an iterable object. If there are any items in the tuple, they are yielded as part of the resulting iterable. In principle, the function then operates recursively on the remaining items from the original iterable. As the recursion has limitations in Python, we’ve optimized the tail call structure into an explicit while statement.

The results of the group_by_iter() function is a sequence of n-tuples. In the following example, we’ll create a sequence of numbers using a filter function, and then group them into 7-tuples:

>>> from pprint import pprint 
>>> data = list( 
...     filter(lambda x: x % 3 == 0 or x % 5 == 0, range(1, 50)) 
... ) 
>>> data 
[3, 5, 6, 9, 10, ..., 48] 
>>> grouped = list(group_by_iter(7, iter(data))) 
>>> pprint(grouped) 
[(3, 5, 6, 9, 10, 12, 15), 
 (18, 20, 21, 24, 25, 27, 30), 
 (33, 35, 36, 39, 40, 42, 45), 
 (48,)]

We can merge grouping and filtering into a single function that does both operations in a single function body. The modification to group_by_iter() looks as follows:

from collections.abc import Callable, Iterator, Iterable 
from typing import Any, TypeAlias 
 
ItemFilterPredicate: TypeAlias = Callable[[Any], bool] 
 
def group_filter_iter( 
        n: int, 
        predicate: ItemFilterPredicate, 
        items: Iterator[ItemT] 
) -> Iterator[tuple[ItemT, ...]]: 
    def group(n: int, iterable: Iterator[ItemT]) -> Iterator[ItemT]: 
        for i in range(n): 
            try: 
                yield next(iterable) 
            except StopIteration: 
                return 
 
    subset = filter(predicate, items) 
    # ^-- Added this to apply the filter 
    while row := tuple(group(n, subset)): 
                              # ^-- Changed to use the filter 
        yield row

We’ve added a single line to the group_by_iter() function. This application of filter() creates a subset. We’ve changed the while row := tuple(group(n, subset)): line to use the subset instead of the original collection of items.

This group_filter_iter() function applies the filter predicate function to the source iterable provided as the items parameter. As the filter output is itself a non-strict iterable, the subset value isn’t computed in advance; the values are created as needed. The bulk of this function is identical to the version shown previously.

We can slightly simplify the context in which we use this function. We can compare the explicit use of filter() and this combined function where the filter() is implicit. The comparison is in the following example:

>>> rule: ItemFilterPredicate = lambda x: x % 3 == 0 or x % 5 == 0 
>>> groups_explicit = list( 
...    group_by_iter(7, filter(rule, range(1, 50))) 
... ) 
>>> groups = list( 
...     group_filter_iter(7, rule, iter(range(1, 50))) 
... )

Here, we’ve applied the filter predicate and grouped the results in a single function invocation. In the case of the filter() function, it’s rarely a clear advantage to apply the filter in conjunction with other processing. It seems as if a separate, visible filter() function is more helpful than a combined function.

5.8 Building higher-order functions with callables

We can define higher-order functions as callable classes. This builds on the idea of writing generator functions; we’ll write callables because we need stateful features of Python, like instance variables. In addition to using statements, we can also apply a static configuration when creating the higher-order functions. The Strategy design pattern, in particular, works very nicely to alter the features of callable objects.

What’s important about a callable class definition is that the class object, created by the class statement, defines a function that emits a function. Commonly, we’ll use a callable object to create a composite function that combines functions into something relatively complex.

To emphasize this, consider the following class:

from collections.abc import Callable 
from typing import Any 
 
class NullAware: 
    def __init__(self, some_func: Callable[[Any], Any]) -> None: 
        self.some_func = some_func 
 
    def __call__(self, arg: Any) -> Any: 
        return None if arg is None else self.some_func(arg)

This class is used to create a new function that is null aware. When an instance of this class is created, a function, some_func, is provided. The only restriction stated is that some_func be Callable[[Any], Any]. This means the argument takes a single argument and results in a single result. The resulting object is callable. A single, optional argument is expected. The implementation of the __call__() method handles the use of None objects as an argument. This method has the effect of making the resulting object Callable[[Optional[Any]], Optional[Any]].

For example, evaluating the NullAware(math.log) expression will create a new function that can be applied to argument values. The __init__() method will save the given function in the resulting object. This object is a function that can then be used to process data.

The common approach is to create the new function and save it for future use by assigning it a name, as follows:

import math 
 
null_log_scale = NullAware(math.log) 
 
null_round_4 = NullAware(lambda x: round(x, 4))

The first example creates a new function, and assigns the name null_log_scale(). The second example creates a null-aware function, null_round_4, that uses a lambda object as the internal value of the function to apply if the parameter is not None. We can then use the function in another context. Take a look at the following example:

>>> some_data = [10, 100, None, 50, 60] 
>>> scaled = map(null_log_scale, some_data) 
>>> [null_round_4(v) for v in scaled] 
[2.3026, 4.6052, None, 3.912, 4.0943]

This example’s __call__() method relies entirely on expression evaluation. It’s an elegant and tidy way to define composite functions built up from lower-level component functions.

5.8.1 Assuring good functional design

The idea of stateless functional programming requires some care when using Python objects. Objects are typically stateful. Indeed, one can argue that the entire purpose of object-oriented programming is to encapsulate state change into class definitions. Because of this, we find ourselves pulled in opposing directions between functional programming and imperative programming when using Python class definitions to process collections.

The benefit of using a callable object to create a composite function gives us slightly simpler syntax when the resulting composite function is used. When we start working with iterable mappings or reductions, we have to be aware of how and why we introduce stateful objects.

We’ll turn to a fairly complex function that contains the following features:

  • It applies a filter to a source of items.

  • It applies a mapping to the items which pass the filter.

  • It computes a sum of the mapped values.

We can try to define it as a simple higher-order function, but with three separate parameter values, it would be cumbersome to use. Instead, we’ll create a callable object that is configured by the filter and mapping functions.

Using objects to configure an object is the Strategy design pattern, used in object-oriented programming. Here’s a class definition that requires the filter and mapping function in order to create a callable:

from collections.abc import Callable, Iterable 
 
class Sum_Filter: 
    __slots__ = ["filter", "function"] 
 
    def __init__(self, 
            filter: Callable[[float], bool], 
            func: Callable[[float], float]) -> None: 
        self.filter = filter 
        self.function = func 
 
    def __call__(self, iterable: Iterable[float]) -> float: 
        return sum( 
            self.function(x) 
            for x in iterable 
            if self.filter(x) 
        )

This class has two slots in each object; this puts a few constraints on our ability to use the function as a stateful object. It doesn’t prevent all modifications to the resulting object, but it limits us to just two attributes. Attempting to add an attribute will result in an exception.

The initialization method, __init__(), stows the two function objects, filter and func, in the object’s instance variables. The __call__() method returns a value based on a generator expression that uses the two internal function definitions. The self.filter() function is used to pass or reject items. The self.function() function is used to transform objects that are passed by the filter() function.

An instance of this class is a function that has two Strategy functions built into it. We create an instance as follows:

count_not_none = Sum_Filter( 
    lambda x: x is not None, 
    lambda x: 1 
)

We’ve built a function named count_not_none() that counts the non-None values in a sequence. It does this by using a lambda to pass non-None values and a function that uses a constant, 1, instead of the actual values present.

Generally, this count_not_none() object will behave like any other Python function. We can use the count_not_None() function as follows:

>>> some_data = [10, 100, None, 50, 60] 
>>> count_not_none(some_data) 
4

This shows a technique for using some of Python’s object-oriented programming features to create callable objects that are used in a functional approach to designing and building software. We can delegate some complexity to creating a sophisticated function. Having a single function with multiple features can simplify understanding of the context in which the function is used.

5.9 Review of some design patterns

The max(), min(), and sorted() functions have a default behavior without a key= function. They can be customized by providing a function that defines how to compute a key from the available data. For many of our examples, the key= function has been a simple extraction of available data. This isn’t a requirement; the key= function can do anything.

Imagine the following method: max(trip, key=random.randint()). Generally, we try not to have key= functions that do something obscure like this.

The use of a key= function is a common design pattern. Functions we design can easily follow this pattern.

We’ve also looked at the way lambda forms can simplify the application of higher-order functions. One significant advantage of using lambda forms is that it follows the functional paradigm very closely. When writing more conventional functions, we can create imperative programs that might clutter an otherwise succinct and expressive functional design.

We’ve looked at several kinds of higher-order functions that work with a collection of values. Throughout the previous chapters, we’ve hinted at several different design patterns for higher-order functions that apply to collection objects and scalar objects. The following is a broad classification:

  • Return a generator: A higher-order function can return a generator expression. We consider the function higher-order because it didn’t return scalar values or collections of values. Some of these higher-order functions also accept functions as arguments.

  • Act as a generator: Some function examples use the yield statement to make them first-class generator functions. The value of a generator function is an iterable collection of values that are evaluated lazily. We suggest that a generator function is essentially indistinguishable from a function that returns a generator expression. Both are non-strict. Both can yield a sequence of values. For this reason, we’ll also consider generator functions as higher order. Built-in functions, such as map() and filter(), fall into this category.

  • Materialize a collection: Some functions must return a materialized collection object: list, tuple, set, or mapping. These kinds of functions can be of a higher order if they have a function as part of the arguments. Otherwise, they’re ordinary functions that happen to work with collections.

  • Reduce a collection: Some functions work with an iterable to create a scalar result. The len() and sum() functions are examples of this. We can create higher-order reductions when we accept a function as an argument. We’ll return to this in the next chapter.

  • Scalar: Some functions act on individual data items. These can be higher-order functions if they accept another function as an argument.

As we design our own software, we can pick and choose among these established design patterns.

5.10 Summary

In this chapter, we have seen two reductions that are higher-order functions: max() and min(). We looked at the two central higher-order functions, map() and filter(). We also looked at sorted().

In addition, we looked at how to use a higher-order function to transform the structure of data. We can perform several common transformations, including wrapping, unwrapping, flattening, and structuring sequences of different kinds.

We looked at two ways to define our own higher-order functions, which are as follows:

  • The def statement. Similar to a lambda form that we assign to a variable.

  • Defining a callable class as a kind of function that emits composite functions.

We can also use decorators to emit composite functions. We’ll return to this in Chapter 12, Decorator Design Techniques.

In the next chapter, we’ll look at the idea of purely functional iteration via recursion. We’ll use Pythonic structures to make several common improvements over purely functional techniques. We’ll also look at the associated problem of performing reductions from collections to individual values.

5.11 Exercises

This chapter’s exercises are based on code available from Packt Publishing on GitHub. See https://github.com/PacktPublishing/Functional-Python-Programming-3rd-Edition.

In some cases, the reader will notice that the code provided on GitHub includes partial solutions to some of the exercises. These serve as hints, allowing the reader to explore alternative solutions.

In many cases, exercises will need unit test cases to confirm they actually solve the problem. These are often identical to the unit test cases provided in the GitHub repository. The reader should replace the book’s example function name with their own solution to confirm that it works.

5.11.1 Classification of state

A web application might have a number of servers of various kinds, a database, and installed software components. Someone responsible for website reliability will want to know when things are running reasonably well. When things are broken, they’ll want details.

As part of monitoring, a health application can gather status from the various components and summarize the status into an overall ”health” value. The idea is to perform a kind of ”reduce” on the status information.

Each individual service has a status URL that can be pinged for status information. The results can take one of these four values:

  • No response at all. The service is not working. This is bad.

  • A response that is outside a healthy time window. Even if the response is "working", the service is degraded.

  • A response of "not working". A response of ”not working” is almost as bad as no response at all. It indicates a severe problem, but it also means monitoring software is working.

  • A response of "working". This is the ideal response.

The status forms a collection of 3-tuples: ("service", "status", response time). The service is a name, like "primary database" or "router" or any of numerous other services that can be part of a distributed web application. The status value is a string value of either "working" or "not working". The response time is the number of milliseconds it took to respond. Typical numbers are 10-50.

The summary is one of the following values:

  • Stopped: There is one service that is not responding.

  • Degraded: There is one service that has responded with a time that is out of the healthy time window of 50 milliseconds or less. Or, there is one service that has responded with "not working".

  • Running: All services are working and responding within the 50 millisecond window.

Two of the possible implementations are as follows:

  • Write four filter functions. Apply the filters to the sequence of status values and count how many match each filter. Based on the number of matches, decide which of the three responses to provide for the overall health of the system.

  • Write a mapping to apply a severity number: 2 for an indication of Stopped, 1 for either of the indications of Degraded, or 0 for all other service status messages. The maximum value of this vector is the overall health of the system.

Implement all variations. Compare the resulting code for clarity and expressiveness.

5.11.2 Classification of state, Part II

In the previous exercise, services were described as reporting a status value as a string value of either "working" or "not working".

Before proceeding, either complete the previous exercise, or develop a workable design to solve the previous exercise.

Due to technology upgrades, the status values for some services include a third value: "degraded". This has the same implication as a slow response from a service. This may change the design. It will certainly change the implementation.

Provide an implementation that gracefully handles the idea of additional or distinct status messages. The idea is to isolate the status-message checking to a function that can be replaced easily. For example, we might start with three functions to evaluate status values: is_stopped(), is_degraded(), and is_working(). When a change is required, we can write a new version, is_degraded_2(), that can be used in place of the old is_degraded() function.

The objective is to create an application that does not require a change to the implementation of any particular function. Instead, new functions are added; these new functions will reuse existing functions plus new functions to complete the expanded objectives.

5.11.3 Optimizing a file parser

In Flattening data while mapping, we used the following expression to extract a sequence of numbers from text with space separators:

( 
    v 
    for line in text.splitlines() 
        for v in line.split() 
)

The definition of the split() method includes the character, which is also used by the splitlines() method. It seems like this can be optimized to use only the split() method.

After getting this to work, change the source text in the example to:

>>> text = """2,3,5,7,11,13,17,19,23,29 
... 31,37,41,43,47,53,59,61,67,71 
... 73,79,83,89,97,101,103,107,109,113 
... 127,131,137,139,149,151,157,163,167,173 
... 179,181,191,193,197,199,211,223,227,229 
... """

We can parse this with a single use of the split() method. This requires restructuring a single, long sequence of values into various rows and columns.

Is this faster than the use of the splitlines() and split() methods?

Join our community Discord space

Join our Python Discord workspace to discuss and know more about the book: https://packt.link/dHrHU

PIC

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.37.169