Using regexes to find characters in a set

In this section, we extend character checking to be more specific regarding the characters that we are checking for. You should ensure that you understand this in both directions—so you can create your own regex statements, and so you can interpret premade statements and which things they will match.

There are times when you might want to be more specific than just digit/non-digit or alphanumeric/non-alphanumeric.

One example might be checking a Cisco router message log based on severity (where the levels range from emergencies 0 through to debugging 7). While an 8 would match the d search string, this would not be a valid severity level. Equally, if the logs were made from Simple Network Management Protocol (SNMP), the levels instead range from emergencies being 1 through to debugging at 8.

Alternatively, we might examine a file access log for files with specific security classification identifiers (for example, U for unclassified, C for confidential, and S for secret). A letter D would not be a valid security level, even though it would match the alphanumeric criteria.

To select characters from a set, you can enclose the set within square brackets []. This string will then match any single character from the list or within the range. The search string [abcd] will match any of the individual letters—a, b, c, or d—but not the letter e. The same matches could also be made using the search string [a-d], which represents the range a to d. This notation has obvious benefits for long, continuous sequences.

When it comes to using the hyphen to denote sequential letters/numbers, it is worth noting that this requires continuous ASCII sequences. A breakdown of this is in the preceding diagram, but, of particular note, is the fact that the numbers run from 0-9 (rather than 1 to 9, and then 0), and then there are a few punctuation characters before the capital letters; the lowercase letters run from 97-122, separated from the capital letters by further punctuation characters. This means that a sequence like [0-F], which you might expect to filter only characters from a valid hexadecimal representation, would actually also match the question mark (?), colon (:), and the ampersand (&) or at (@) symbols.

To only match valid hexadecimal characters, the string [0-9A-F] would be used; separating the two ranges with a comma is not recommended, as this will also match the , symbol.

In the following two diagrams, a regex statement is constructed for matching MAC addresses in the Linux format. Notice how the hexadecimal characters are matched to allow for capital letters (standard notation) or lowercase (cisco ios preferred formatting), although the Cisco format will not match, as the separators are in different places in the Cisco format (aaaa.bbbb.cccc instead of aa:bb:cc:dd:ee:dd):

Regex statement to match MAC addresses in the Linux format

This statement is used to extract the MAC addresses for all networking interfaces on the host as follows. The ifconfig command returns the interface information, which is then passed to the grep function using the pipe (|) character. Only the matched section is matched, as we have issued the option, -o, in addition to the option E (extended regex).

Using grep to filter the output of other system functions

Being able to use grep on system functions as well as on existing log files is very important. A server might have multiple network interfaces, so this would be a legitimate method of documenting the available equipment. If we were to put another pipe in the command line, we could pass the results of grep into a log file for later reference.

Table of Contents for Using regexes to find characters in a set

Create new playlist

Sign In

Sign Up

Table of Contents for
Using regexes to find characters in a set