This is basically another find-and-replace type of program, like swapping all the numbers in a string in chapter 4 or all the vowels in a string in chapter 8. We wrote those programs using very manual, imperative methods, like iterating through all the characters of a string, comparing them to some wanted value, and possibly returning a new value.
In the final solution for chapter 8, we briefly touched on “regular expressions” (also called “regexes”--pronounced with a soft “g” like in “George”), which give us a declarative way to describe patterns of text. The material here may seem a bit of a reach, but I really want to help you dig into regexes to see what they can do.
bl br ch cl cr dr fl fr gl gr pl pr sc sh sk sl sm sn sp st sw th tr tw thw wh wr sch scr shr sph spl spr squ str thr
These are the first three words our program will produce for “cake”:
$ ./rhymer.py cake | head -3 bake blake brake
$ ./rhymer.py cake | tail -3 xake yake zake
Make sure your output is sorted alphabetically as this is important for the tests.
We’ll replace any leading consonants with a list of other consonant sounds to create a total of 56 words:
$ ./rhymer.py cake | wc -l 56
Note that we’ll replace all the leading consonants, not just the first one. For instance, with the word “chair” we need to replace “ch”:
$ ./rhymer.py chair | tail -3 xair yair zair
If a word like “apple” does not start with a consonant, we’ll append all the consonant sounds to the beginning to create words like “bapple” and “shrapple.”
$ ./rhymer.py apple | head -3 bapple blapple brapple
Because there is no consonant to replace, words that start with a vowel will produce 57 rhyming words:
$ ./rhymer.py apple | wc -l 57
To make this a bit easier, the output should always be all lowercase, even if the input has uppercase letters:
$ ./rhymer.py GUITAR | tail -3 xuitar yuitar zuitar
If a word contains nothing but consonants, we’ll print a message stating that the word cannot be rhymed:
$ ./rhymer.py RDNZL Cannot rhyme "RDNZL" The task of finding the initial consonants is made significantly easier with regexes. |
|
The program takes a single, positional argument, which is the string to rhyme. Figure 14.1 shows a snazzy, jazzy, frazzy, thwazzy string diagram.
If given no arguments or the -h
or --help
flags, it should print a usage statement:
$ ./rhymer.py -h usage: rhymer.py [-h] word Make rhyming "words" positional arguments: word A word to rhyme optional arguments: -h, --help show this help message and exit
To my mind, the main problem of the program is breaking the given word into the leading consonant sounds and the rest--something like the “stem” of the word.
To start out, we can define a placeholder for a function I call stemmer()
that does nothing right now:
def stemmer():
"""Return leading consonants (if any), and 'stem' of word"""
pass ①
① The pass statement will do nothing at all. Since the function does not return a value, Python will return None by default.
Then we can define a test_stemmer()
function to help us think about the values we might give the function and what we expect it to return. We want a test with good values like “cake” and “apple” that can be rhymed as well as values like the empty string or a number, which cannot:
def test_stemmer(): """ Test stemmer """ assert stemmer('') == ('', '') ① ② assert stemmer('cake') == ('c', 'ake') assert stemmer('chair') == ('ch', 'air') ③ ④ assert stemmer('APPLE') == ('', 'apple') assert stemmer('RDNZL') == ('rdnzl', '') ⑤ ⑥ assert stemmer('123') == ('123', '')
The tests cover the following good and bad inputs:
② A word with a single leading consonant
③ A word with a leading consonant cluster
④ A word with no initial consonants; also an uppercase word, so this checks that lowercase is returned
⑥ Something that isn’t a word at all
I decided that my stemmer()
function will always returns a 2-tuple of the (start,
rest)
of the word. (You can write a function that does something different, but be sure to change the test to match.) It’s the second part of that tuple
--the rest
--that we can use to create rhyming words. For instance, the word “cake” produces a tuple
with ('c',
'ake')
, and “chair” is split into ('ch',
'air')
. The argument “APPLE
” has no start
and only the rest
of the word, which is lowercase.
When I’m writing tests, I usually try to provide both good and bad data to my functions and programs. Three of the test values cannot be rhymed: the empty string (''
), a string with no vowels ('RDNZL'
), and a string with no letters ('123'
). The stemmer()
function will still return a tuple
containing the lowercased word in the first position of the tuples and the empty string in the second position for the rest
of the word. It is up to the calling code to deal with a word that has no part that can be used to rhyme.
It’s certainly possible to write this program without regular expressions, but I hope you’ll see how radically different using regexes can be from manually writing your own search-and-replace code.
To start off, we need to bring in the re
module:
>>> import re
I encourage you to read help(re)
to get a feel for all that you can do with regexes. They are a deep subject with many books and whole branches of academia devoted to them (Mastering Regular Expressions by Jeffrey Friedl (O’Reilly, 2006) is one book I would recommend). There are many helpful websites that can further explain regexes, and some can help you write them (such as https://regexr.com/). We will only scratch the surface of what you can do with regexes.
Our goal in this program is to write a regex that will find consonants at the beginning of a string. We can define consonants as the characters of the English alphabet that are not vowels (“a,” “e,” “i,” “o,” and “u”). Our stemmer()
function will only return lowercase letters, so there are only 21 consonants we need to define. You could write them out, but I’d rather write a bit of code!
We can start with string.ascii_lowercase
:
>>> import string >>> string.ascii_lowercase 'abcdefghijklmnopqrstuvwxyz'
Next, we can use a list comprehension with a “guard” clause to filter out the vowels. As we want a str
of consonants and not a list
, we can use str.join()
to make a new str
value:
>>> import string as s >>> s.ascii_lowercase 'abcdefghijklmnopqrstuvwxyz' >>> consonants = ''.join([c for c in s.ascii_lowercase if c not in 'aeiou']) >>> consonants 'bcdfghjklmnpqrstvwxyz'
The longer way to write this with a for
loop and an if
statement is as follows (see figure 14.2):
consonants = '' for c in string.ascii_lowercase: if c not in 'aeiou': consonants += c
In chapter 8 we created a “character class” for matching the vowels by listing them in square brackets, like '[aeiou]'
. We can do the same here with our consonants
, like so:
>>> pattern = '[' + consonants + ']' >>> pattern '[bcdfghjklmnpqrstvwxyz]'
The re
module has two search-like functions called re.match()
and re.search()
, and I always get them confused. They both look for a pattern
(the first argument) in some text
, but the re.match()
functions starts from the beginning of the text
, whereas the re.search()
function will match starting anywhere in the text
.
As it happens, re.match()
is just fine because we are looking for consonants at the beginning of a string (see figure 14.3).
>>> text = 'chair' >>> re.match(pattern, text) ① <re.Match object; span=(0, 1), match='c'> ②
① Try to match the given pattern in the given text. If this succeeds, we get a re.Match object; otherwise, the value None is returned.
② The match was successful, so we see a “stringified” version of the re.Match object.
The match='c'
shows us that the regular expression found the string 'c'
at the beginning. Both the re.match()
and re.search()
functions will return a re.Match
object on success. You can read help(re.Match)
to learn more about all the cool things you can do with them:
>>> match = re.match(pattern, text) >>> type(match) <class 're.Match'>
How do we get our regex to match the letters 'ch'
? We can put a '+'
sign after the character class to say we want one or more (see figure 14.4). (Does this sound a bit like nargs='+'
to say one or more arguments?) I will use an f-string here to create the pattern:
>>> re.match(f'[{consonants}]+', 'chair') <re.Match object; span=(0, 2), match='ch'>
What does it give us for a string with no leading consonants like “apple,” as in figure 14.5?
>>> re.match(f'[{consonants}]+', 'apple')
It seems we got nothing back from that. What is the type()
of that return value?
>>> type(re.match(f'[{consonants}]+', 'apple')) <class 'NoneType'>
Both the re.match()
and re.search()
functions return None
to indicate a failure to match any text. We know that only some words will have a leading consonant sound, so this is not surprising. We’ll see in a moment how to make this an optional match.
It’s all well and good to have found (or not) the leading consonants, but the goal here is to split the text
into two parts: the consonants (if any) and the rest of the word.
We can wrap parts of the regex in parentheses to create “capture groups.” If the regex matches successfully, we can recover the parts using the re.Match.groups()
method (see figure 14.6):
>>> match = re.match(f'([{consonants}]+)', 'chair') >>> match.groups() ('ch',)
To capture everything that comes after the consonants
, we can use a period (.
) to match anything, and add a plus sign (+
) to mean one or more. We can put that into parentheses to capture it (see figure 14.7):
>>> match = re.match(f'([{consonants}]+)(.+)', 'chair') >>> match.groups() ('ch', 'air')
What happens when we try to use this on “apple”? It fails to make the first match on the consonants, so the whole match fails and returns None
(see figure 14.8):
>>> match = re.match(f'([{consonants}]+)(.+)', 'apple') >>> match.groups() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'groups'
Remember that re.match()
returns None
when it fails to find the pattern. We can add a question mark (?
) at the end of the consonants
pattern to make it optional (see figure 14.9):
>>> match = re.match(f'([{consonants}]+)?(.+)', 'apple') >>> match.groups() (None, 'apple')
The match.groups()
function returns a tuple
containing the matches for each grouping created by the parentheses. You can also use match.group()
(singular) with a group number to get a specific group. Note that these start numbering from 1:
>>> match.group(1) ① >>> match.group(2) ② 'apple'
① There was no match for the first group on “apple,” so this is a None.
② The second group captured the entire word.
If you match on “chair,” there are values for both groups:
>>> match = re.match(f'([{consonants}]+)?(.+)', 'chair') >>> match.group(1) 'ch' >>> match.group(2) 'air'
So far we’ve only dealt with lowercase text because our program will always emit lowercase values. Still, let’s explore what happens when we try to match uppercase text:
>>> match = re.match(f'([{consonants}]+)?(.+)', 'CHAIR') >>> match.groups() (None, 'CHAIR')
Not surprisingly, that fails. Our pattern only defines lowercase characters. We could add all the uppercase consonants, but it’s a bit easier to use a third optional argument to re.match()
to specify case-insensitive searching:
>>> match = re.match(f'([{consonants}]+)?(.+)', 'CHAIR', re.IGNORECASE) >>> match.groups() ('CH', 'AIR')
Or you can force the text you are searching to lowercase:
>>> match = re.match(f'([{consonants}]+)?(.+)', 'CHAIR'.lower()) >>> match.groups() ('ch', 'air')
What do you get when you search on text that has nothing but consonants?
>>> match = re.match(f'([{consonants}]+)?(.+)', 'rdnzl') >>> match.groups() ('rdnz', 'l')
Were you expecting the first group to include all the consonants and the second group to have nothing? It might seem a bit odd that it decided to split off the “l” into the last group, as shown in figure 14.10, but we have to think extremely literally about how the regex engine is working. We described an optional group of one or more consonants that must be followed by one or more of anything else. The “l” counts as one or more of anything else, so the regex matched exactly what we requested.
If we change the (.+)
to (.*)
to make it zero or more, it works as expected:
>>> match = re.match(f'([{consonants}]+)?(.*)', 'rdnzl') >>> match.groups() ('rdnzl', '')
Our regex is not quite complete, as it doesn’t handle matching on something like 123
. That is, it matches too well because the period (.
) will match the digits, which we don’t want:
>>> re.match(f'([{consonants}]+)?(.*)?', '123') <re.Match object; span=(0, 3), match='123'>
We need to indicate that there should be at least one vowel after the consonants, which may be followed by anything else. We can use another character class to describe any vowel. Since we need to capture this, we’ll put it in parentheses, so ([aeiou])
. That may be followed by zero or more of anything, which also needs to be captured, so (.*)
, as shown in figure 14.11.
Let’s go back and try this on values we expect to work:
>>> re.match(f'([{consonants}]+)?([aeiou])(.*)', 'cake').groups() ('c', 'a', 'ke') >>> re.match(f'([{consonants}]+)?([aeiou])(.*)', 'chair').groups() ('ch', 'a', 'ir') >>> re.match(f'([{consonants}]+)?([aeiou])(.*)', 'apple').groups() (None, 'a', 'pple')
As you can see, this fails to match when the string contains no vowels or letters:
>>> type(re.match(f'([{consonants}]+)?([aeiou])(.*)', 'rdnzl')) <class 'NoneType'> >>> type(re.match(f'([{consonants}]+)?([aeiou])(.*)', '123')) <class 'NoneType'>
We know that our program will receive some inputs that cannot be rhymed, so what should the stemmer()
function do with these? Some people like to use exceptions in cases like this. We’ve encountered exceptions like asking for a list index or a dictionary key that does not exist. If exceptions are not caught and handled, they cause our programs to crash!
I try to avoid writing code that creates exceptions. I decided that my stemmer()
function would always return a 2-tuple of (start,
rest)
, and that I would always use the empty string to denote a missing value rather than a None
. Here is one way I could write the code for returning those tuples:
if match: ① p1 = match.group(1) or '' ② p2 = match.group(2) or '' p3 = match.group(3) or '' return (p1, p2 + p3) ③ else: return (word, '') ④
① The match will be None if the regex failed, which is “falsey.” If it succeeds, then it will be “truthy.”
② There are three capture groups that we can put into three variables. We want to ensure we don’t return any None values, so we can use an “or” to evaluate the left side as “truthy” and take the empty string on the right if it’s not.
③ Return a tuple that has the first part of the word (maybe consonants) and the “rest” of the word (the vowel plus anything else).
④ If the match was None, return a tuple of the word and an empty string to indicate there is no “rest” of the word to rhyme.
Let’s take a moment to think about the or
operator, which we’re using to decide between something on the left or something on the right. The or
will return the first “truthy” value, the one that--sort of, kind of--evaluates to True
in a Boolean context:
>>> True or False ① True >>> False or True ② True >>> 1 or 0 ③ 1 >>> 0 or 1 ④ 1 >>> 0.0 or 1.0 ⑤ 1.0 >>> '0' or '' ⑥ '0' >>> 0 or False ⑦ False >>> [] or ['foo'] ⑧ ['foo'] >>> {} or dict(foo=1) ⑨ {'foo': 1}
① It’s easiest to see with literal True and False values.
② No matter the order, the True value will be taken.
③ In a Boolean context, the integer value 0 is “falsey,” and any other value is “truthy.”
④ The number values behave exactly like actual Boolean values.
⑤ Floating-point values also behave like integer values, where 0.0 is “falsey” and anything else is “truthy.”
⑥ With string values, the empty string is “falsey” and anything else is “truthy.” It may look odd because it returns '0', but that’s not the numeric value zero but the string we use to represent the value of zero. Wow, so philosophical.
⑦ If no value is “truthy,” the last value is returned.
⑧ The empty list is “falsey,” so any non-empty list is “truthy.”
⑨ The empty dict is “falsey,” and any non-empty dict is “truthy.”
You should be able to use these ideas to write a stemmer()
function that will pass the test_stemmer()
function. Remember, if both of these functions are in your rhymer.py program, you can run the test_
functions like so:
$ pytest -xv rhymer.py
Let’s review what the program should do:
Try to split it into two parts: any leading consonants and the rest of the word.
If the split is successful, combine the “rest” of the word (which might actually be the entire word if there are no leading consonants) with all the other consonant sounds. Be sure to not include the original consonant sound and to sort the rhyming strings.
If you are unable to split the word, print the message Cannot
rhyme
"<word>"
.
Now it’s time to write the program. Have fun storming the castle!
“No more rhymes now, I mean it!”
Let’s take a look at one way to solve this problem. How different was your solution?
#!/usr/bin/env python3 """Make rhyming words""" import argparse import re ① import string # -------------------------------------------------- def get_args(): """get command-line arguments""" parser = argparse.ArgumentParser( description='Make rhyming "words"', formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument('word', metavar='word', help='A word to rhyme') return parser.parse_args() # -------------------------------------------------- def main(): """Make a jazz noise here""" args = get_args() ② prefixes = list('bcdfghjklmnpqrstvwxyz') + ( ③ 'bl br ch cl cr dr fl fr gl gr pl pr sc ' 'sh sk sl sm sn sp st sw th tr tw thw wh wr ' 'sch scr shr sph spl spr squ str thr').split() start, rest = stemmer(args.word) ④ if rest: ⑤ print(' '.join(sorted([p + rest for p in prefixes if p != start]))) ⑥ else: print(f'Cannot rhyme "{args.word}"') ⑦ # -------------------------------------------------- def stemmer(word): """Return leading consonants (if any), and 'stem' of word""" word = word.lower() ⑧ vowels = 'aeiou' ⑨ consonants = ''.join( ⑩ [c for c in string.ascii_lowercase if c not in vowels]) pattern = ( ⑪ '([' + consonants + ']+)?' # capture one or more, optional '([' + vowels + '])' # capture at least one vowel '(.*)' # capture zero or more of anything ) match = re.match(pattern, word) ⑫ if match: ⑬ p1 = match.group(1) or '' ⑭ p2 = match.group(2) or '' p3 = match.group(3) or '' return (p1, p2 + p3) ⑮ else: return (word, '') ⑯ # -------------------------------------------------- def test_stemmer(): ⑰ """test the stemmer""" assert stemmer('') == ('', '') assert stemmer('cake') == ('c', 'ake') assert stemmer('chair') == ('ch', 'air') assert stemmer('APPLE') == ('', 'apple') assert stemmer('RDNZL') == ('rdnzl', '') assert stemmer('123') == ('', '') # -------------------------------------------------- if __name__ == '__main__': main()
① The re module is for regular expressions.
② Get the command-line arguments.
③ Define all the prefixes that will be added to create rhyming words.
④ Split the word argument into two possible parts. Because the stemmer() function always returns a 2-tuple, we can unpack the values into two variables.
⑤ Check if there is a part of the word that we can use to create rhyming strings.
⑥ If there is, use a list comprehension to iterate through all the prefixes and add them to the stem of the word. Use a guard to ensure that any given prefix is not the same as the beginning of the word. Sort all the values and print them, joined on newlines.
⑦ If there is nothing for the “rest” of the word that can be used to create rhymes, let the user know.
⑨ Since we will use the vowels more than once, assign them to a variable.
⑩ The consonants are the letters that are not vowels. We will only match to lowercase letters.
⑪ The pattern is defined using consecutive literal strings that Python will join together into one string. By breaking up the pieces onto separate lines, we can comment on each part of the regular expression.
⑫ Use the re.match() function to start matching at the beginning of the word.
⑬ The re.match() function will return None if the pattern failed to match, so check if the match is “truthy” (not None).
⑭ Put each group into a variable, always ensuring that we use the empty string rather than None.
⑮ Return a new tuple that has the “first” part of the word (possible leading consonants) and the “rest” of the word (the vowel plus anything else).
⑯ If the match failed, return the word and an empty string for the “rest” of the word to indicate there is nothing to rhyme.
⑰ The tests for the stemmer() function. I usually like to put my unit tests directly after the functions they test.
There are many ways you could have written this, but, as always, I wanted to break the problem down into units I could write and test. For me, this came down to splitting the word into a possible leading consonant sound and the rest of the word. If I can manage that, I can create rhyming strings; if I cannot, then I need to alert the user.
For the purposes of this program, the “stem” of a word is the part after any initial consonants, which I define using a list comprehension with a guard to take only the letters that are not vowels:
>>> vowels = 'aeiou' >>> consonants = ''.join([c for c in string.ascii_lowercase if c not in vowels])
Throughout the chapters, I have shown how a list comprehension is a concise way to generate a list and is preferable to using a for
loop to append to an existing list. Here we have added an if
statement to only include some characters if they are not vowels. This is called a guard statement, and only those elements that evaluate as “truthy” will be included in the resulting list
.
We’ve looked at map()
several times now and talked about how it is a higher-order function (HOF) because it takes another function as the first argument and will apply it to all the elements from some iterable (something that can be iterated, like a list
). Here I’d like to introduce another HOF called filter()
, which also takes a function and an iterable (see figure 14.12). As with the list comprehension with the guard, only those elements that return a “truthy” value from the function are allowed in the resulting list
.
Here is another way to write the idea of the list comprehension using filter()
:
>>> consonants = ''.join(filter(lambda c: c not in vowels, string.ascii_lowercase))
Just as with map()
, I use the lambda
keyword to create an anonymous function. The c
is the variable that will hold the argument, which, in this case, will be each character from string.ascii_lowercase
. The entire body of the function is the evaluation c
not
in
vowels
. Each of the vowels will return False
for this:
>>> 'a' not in vowels False
And each of the consonants will return True
:
>>> 'b' not in vowels True
Therefore, only the consonants will be allowed to pass through filter()
. Think back to our “blue” cars; let’s write a filter()
that only accepts cars that start with the string “blue”:
>>> cars = ['blue Honda', 'red Chevy', 'blue Ford'] >>> list(filter(lambda car: car.startswith('blue '), cars)) ['blue Honda', 'blue Ford']
When the car
variable has the value “red Chevy,” the lambda
returns False
, and that value is rejected:
>>> car = 'red Chevy' >>> car.startswith('blue ') False
Note that if none of the elements from the original iterable are accepted, filter()
will produce an empty list
([]
). For example, I could filter()
for numbers greater than 10. Note that filter()
is another lazy function that I must coerce using the list
function in the REPL:
>>> list(filter(lambda n: n > 10, range(0, 5))) []
A list comprehension would also return an empty list:
>>> [n for n in range(0, 5) if n > 10] []
Figure 14.13 shows the relationship between creating a new list
called consonants
using an imperative for
-loop approach, an idiomatic list comprehension with a guard, and a purely functional approach using filter()
. All of these are perfectly acceptable, though the most Pythonic technique is probably the list comprehension. The for
loop would be very familiar to a C or Java programmer, while the filter()
approach would be immediately recognizable to the Haskeller or even someone from a Lisp-like language. The filter()
might be slower than the list comprehension, especially if the iterable were large. Choose whichever way makes more sense for your style and application.
We talked in the introduction about the individual parts of the regular expression I ended up using. I’d like to take a moment to mention the way I formatted the regex in the code. I used an interesting trick of the Python interpreter that will implicitly concatenate adjacent string literals. See how these four strings become one:
>>> this_is_just_to_say = ('I have eaten ' ... 'the plums ' ... 'that were in ' ... 'the icebox') >>> this_is_just_to_say 'I have eaten the plums that were in the icebox'
Note that there are no commas after each string, as that would create a tuple
with four individual strings:
>>> this_is_just_to_say = ('I have eaten ', ... 'the plums ', ... 'that were in ', ... 'the icebox') >>> this_is_just_to_say ('I have eaten ', 'the plums ', 'that were in ', 'the icebox')
The advantage of writing out the regular expression on separates lines is that you can add comments to help your reader understand each part:
pattern = ( '([' + consonants + ']+)?' # capture one or more, optional '([' + vowels + '])' # capture at least one vowel '(.*)' # capture zero or more of anything )
The individual strings will be concatenated by Python into a single string:
>>> pattern '([bcdfghjklmnpqrstvwxyz]+)?([aeiou])(.*)'
I could have written the entire regex on one line, but ask yourself which version would you rather read and maintain, the preceding version or the following:1
pattern = f'([{consonants}]+)?([{vowels}])(.*)'
One of the very interesting things about Python code is that your rhymer.py program is also--kind of, sort of--a sharable module of code. That is, you haven’t explicitly written it to be a container of reusable (and tested!) functions, but it is. You can even run the functions from inside the REPL.
For this to work, be sure you run python3
inside the same directory as the rhymer.py code:
>>> from rhymer import stemmer
Now you can run and test your stemmer()
function manually:
>>> stemmer('apple') ('', 'apple') >>> stemmer('banana') ('b', 'anana') >>> import string >>> stemmer(string.punctuation) ('!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~', '')
If you don’t explicitly import
a function, you can use the fully qualified function name by adding the module name to the front:
>>> import rhymer >>> rhymer.stemmer('cake') ('c', 'ake') >>> rhymer.stemmer('chair') ('ch', 'air')
As you write more and more programs, you will find yourself solving some of the same problems repeatedly. It’s far better to create modules with reusable code than to copy pieces from one program to another. If you ever find a bug in a shared function, you can fix it once, and all the programs sharing the function get the fix. The alternative is to find the duplicated code in every program and change it (hoping that this doesn’t introduce even more problems because the code is entangled with other code).
I decided that my stemmer()
function would always return a 2-tuple of the (start,
rest)
for any given word. As such, I can unpack the two values into two variables:
>>> start, rest = stemmer('cat') >>> start 'c' >>> rest 'at'
If there is a value for rest
, I can add all my prefixes
to the beginning:
>>> prefixes = list('bcdfghjklmnpqrstvwxyz') + ( ... 'bl br ch cl cr dr fl fr gl gr pl pr sc ' ... 'sh sk sl sm sn sp st sw th tr tw wh wr' ... 'sch scr shr sph spl spr squ str thr').split()
I decided to use another list comprehension with a guard to skip any prefix that is the same as the start
of the word. The result will be a new list
that I pass to the sorted()
function to get the correctly ordered strings:
>>> sorted([p + rest for p in prefixes if p != start]) ['bat', 'blat', 'brat', 'chat', 'clat', 'crat', 'dat', 'drat', 'fat', 'flat', 'frat', 'gat', 'glat', 'grat', 'hat', 'jat', 'kat', 'lat', 'mat', 'nat', 'pat', 'plat', 'prat', 'qat', 'rat', 'sat', 'scat', 'schat', 'scrat', 'shat', 'shrat', 'skat', 'slat', 'smat', 'snat', 'spat', 'sphat', 'splat', 'sprat', 'squat', 'stat', 'strat', 'swat', 'tat', 'that', 'thrat', 'thwat', 'trat', 'twat', 'vat', 'wat', 'what', 'wrat', 'xat', 'yat', 'zat']
I then print()
that list
, joined on newlines. If there is no rest
of the given word, I print()
a message that the word cannot be rhymed:
if rest: print(' '.join(sorted([p + rest for p in prefixes if p != start]))) else: print(f'Cannot rhyme "{args.word}"')
It is certainly possible to write a solution that does not use regular expressions. We could start by finding the first position of a vowel in the given string. If one is present, we could use a list slice to return the portion of the string up to that position and the portion starting at that position:
def stemmer(word): """Return leading consonants (if any), and 'stem' of word""" word = word.lower() ① vowel_pos = list(map(word.index, filter(lambda v: v in word, 'aeiou'))) ② if vowel_pos: ③ first_vowel = min(vowel_pos) ④ return (word[:first_vowel], word[first_vowel:]) ⑤ else: ⑥ return (word, '') ⑦
① Lowercase the given word to avoid dealing with uppercase letters.
② Filter the vowels 'aeiou' to find those in word, and then map the present vowels to word.index to find their positions. This is one of the rare instances when we need to use the list() function to coerce Python into evaluating the lazy map() function because the next if statement needs a concrete value.
③ Check if there are any vowels present in the word.
④ Find the index of the first vowel by taking the minimum (min) value from the positions.
⑤ Return a tuple of a slice of the word up to the first vowel, and another starting at the first vowel.
⑥ Otherwise, no vowels were found in the word.
⑦ Return a 2-tuple of the word and the empty string to indicate there is no rest of the word to use for rhyming.
Add an --output
option to write the words to a given file. The default should be to write to STDOUT
.
Read an input file and create rhyming words for all the words in the file. You can borrow from the program in chapter 6 to read a file and break it into words, then iterate each word, and create an output file for each word with the rhyming words.
Write a new program that finds all unique consonant sounds in a dictionary of English words. (I have included inputs/words.txt.zip, which is a compressed version of the dictionary from my machine. Unzip the file to use inputs/ words.txt.) Print the output in alphabetical order and use those to expand this program’s consonants.
Alter your program to only emit words that are found in the system dictionary (for example, inputs/words.txt).
Write a program to create Pig Latin, where you move the initial consonant sound from the beginning of the word to the end and add “-ay,” so that “cat” becomes “at-cay.” If a word starts with a vowel, add “-yay” to the end so that “apple” becomes “apple-yay.”
Write a program to create spoonerisms, where the initial consonant sounds of adjacent words are switched, so you get “blushing crow” instead of “crushing blow.”
Regular expressions allow you to declare a pattern that you wish to find. The regex engine will sort out whether the pattern is found or not. This is a declarative approach to programming, in contrast to the imperative method of manually seeking out patterns by writing code ourselves.
You can wrap parts of the pattern in parentheses to “capture” them into groups that you can fetch from the result of re.match()
or re.search()
.
You can add a guard to a list comprehension to avoid taking some elements from an iterable.
The filter()
function is another way to write a list comprehension with a guard. Like map()
, it is a lazy, higher-order function that takes a function that will be applied to every element of an iterable. Only those elements that are deemed “truthy” by the function are returned.
Python can evaluate many types--including strings, numbers, lists, and dictionaries--in a Boolean context to arrive at a sense of “truthiness.” That is, you are not restricted to just True
and False
in if
expressions. The empty string ''
, the int
0
, the float
0.0
, the empty list[]
, and the empty dict{}
are all considered “falsey,” so any non-falsey value from those types, like the non-empty str
, list
, or dict
, or any numeric value not zero-ish, will be considered “truthy.”
You can break long string literals into shorter adjacent strings in your code to have Python join them into one long string. It’s advisable to break long regexes into shorter strings and add comments on each line to document the function of each pattern.
Write small functions and tests, and share them in modules. Every .py file can be a module from which you can import
functions. Sharing small, tested functions is better than writing long programs and copying/pasting code as needed.
18.118.102.225