Using regular expressions in tests

A chapter after introducing if-then-else, we learned all about regular expressions. That chapter, however, was mostly theoretical and only contained a single script! Now, as you might realize, regular expressions are mostly supporting constructs that are to be used with other scripting tools. In the case of the tests we have described, we can use both globbing and regular expressions within the [[...]] blocks! Let's look at this in more depth, as follows:

reader@ubuntu:~/scripts/chapter_11$ vim square-number.sh 
reader@ubuntu:~/scripts/chapter_11$ cat square-number.sh 
#!/bin/bash

#####################################
# Author: Sebastiaan Tammer
# Version: v1.0.0
# Date: 2018-10-26
# Description: Return the square of the input number.
# Usage: ./square-number.sh <number>
#####################################

INPUT_NUMBER=$1

# Check the number of arguments received.
if [[ $# -ne 1 ]]; then
 echo "Incorrect usage, wrong number of arguments."
 echo "Usage: $0 <number>"
 exit 1
fi

# Check to see if the input is a number.
if [[ ! ${INPUT_NUMBER} =~ [[:digit:]] ]]; then 
 echo "Incorrect usage, wrong type of argument."
 echo "Usage: $0 <number>"
 exit 1
fi

# Multiple the input number with itself and return this to the user.
echo $((${INPUT_NUMBER} * ${INPUT_NUMBER}))

We first check if the user supplied the correct number of arguments (which is what we should always do). Next, we use the =~ operator within the test [[..]] block. This allows us to evaluate using regular expressions. In this case, it simply allows us to verify that the user input is a number, and not anything else.

Now, if we call this script, we will see the following:

reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh
Incorrect usage, wrong number of arguments.
Usage: square-number.sh <number>
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 3 2
Incorrect usage, wrong number of arguments.
Usage: square-number.sh <number>
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh a
Incorrect usage, wrong type of argument.
Usage: square-number.sh <number>
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 3
9
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 11
121

We can see that both of our input checks work. If we call this script without exactly one argument ($# -ne 1), it fails. This is true for both 0 and 2 arguments. Next, if we call the script with a letter instead of a number, we get to the second check and the consequent error message: wrong type of argument. Finally, to prove that the script actually does what we want, we will try it with single number: 3 and 11. The returns of 9 and 121 are the squares of these numbers, so it seems we achieved our goal!

However, not everything is always as it seems. This is a common pitfall when using regular expressions, as the following code should illustrate:

reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh a3
0
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 3a
square-number.sh: line 28: 3a: value too great for base (error token is "3a")

How did this happen? We checked to see if the user input was a number, did we not? Actually, contrary to what you might think, we actually checked if the user input matched positively against a number. To say this in simpler terms, if the input contains a number, the check succeeds. What we really want to check is if the input is a number from the beginning to the end. Perhaps this sounds familiar, but it definitely smells like line anchors! The following code applies this:

reader@ubuntu:~/scripts/chapter_11$ vim square-number.sh
reader@ubuntu:~/scripts/chapter_11$ head -5 square-number.sh 
#!/bin/bash

#####################################
# Author: Sebastiaan Tammer
# Version: v1.1.0
reader@ubuntu:~/scripts/chapter_11$ grep 'digit' square-number.sh 
if [[ ! ${INPUT_NUMBER} =~ ^[[:digit:]]$ ]]; then

We made two changes: the search pattern we're matching against is no longer just [[:digit:]], but ^[[:digit:]]$, and we updated the version number (something we haven't done too much up until now). Because we are now anchoring the digit to both the beginning and the end of the line, we can no longer inject a letter in a random spot. Run the script with incorrect input to verify this:

reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh a3
Incorrect usage, wrong type of argument.
Usage: square-number-improved.sh <number>
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 3a
Incorrect usage, wrong type of argument.
Usage: square-number-improved.sh <number>
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 3a3
Incorrect usage, wrong type of argument.
Usage: square-number-improved.sh <number>
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 9
81

I would love to tell you that we're now perfectly safe. But, alas, as it often is with regular expressions, it's not that simple. The script now works great for a single number (0–9), but if you try it with a double-digit number, it fails with wrong type of argument error (try it out!). We need one final tweak to make sure it does exactly what we want: we need to make sure that the digit also accepts multiple sequential digits. The one or more construct in regular expressions is the + sign, which we can just append to [[:digit:]]:

reader@ubuntu:~/scripts/chapter_11$ vim square-number.sh 
reader@ubuntu:~/scripts/chapter_11$ head -5 square-number.sh 
#!/bin/bash

#####################################
# Author: Sebastiaan Tammer
# Version: v1.2.0
reader@ubuntu:~/scripts/chapter_11$ grep 'digit' square-number.sh 
if [[ ! ${INPUT_NUMBER} =~ ^[[:digit:]]+$ ]]; then 
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 15
225
reader@ubuntu:~/scripts/chapter_11$ bash square-number.sh 1x5
Incorrect usage, wrong type of argument.
Usage: square-number-improved.sh <number>

We changed the pattern, upped the version number, and ran the script with different inputs. The final pattern of ^[[:digit:]]+$ can be read as one or more digits from the start to the end of the line, which, in this case, means a number, and nothing else!

The lesson here is that you really need to test your regular expressions thoroughly. As you know by now, search patterns are greedy, and as soon as a little bit matches, it considers the result a success. As seen in the previous example, this was not specific enough. The only way to implement (and learn!) this is by trying to break your own scripts. Try wrong input, weird input, very specific input, and so on. Unless you try a lot, you can't be sure that it will probably work.

You can use all regular expression search patterns in test syntax. Other examples that we will not flesh out, but should definitely be considered, are as follows:

The variable should start with a / (for a fully qualified path)
The variable cannot contain whitespace (using the [[:blank:]] search pattern)
The variable should contain only lowercase letters (achievable with the ^[[:lower:]]+$ pattern)
The variable should contain a file name with an extension (can match on [[:alnum:]].[[:alpha:]])

Table of Contents for Using regular expressions in tests

Create new playlist

Sign In

Sign Up

Table of Contents for
Using regular expressions in tests