Using regexes to find normal characters

In this section, you will learn how to create regex statements that check for the letter, number, and optional formatting characters and character combinations.

Matching specific strings of letters and/or numbers works in exactly the same way as in a standard Ctrl + F style search, except that it is case-sensitive by default; the search string abc would match the words crabcake, drabcloth, and labcoat, but would not match the channel ABC or the taxi firm CabCo.

The search string 123 would match any number containing those three digits in that sequence. Log files (and other things that contain autogenerated dates) should be labeled in ISO 8601-compliant format (YYYYMMDD), so 123 would also match any log file from December 30 or 31 each year.

ISO 8601 only specifies the order of the year, month, and date (plus hours, minutes, seconds, and so on), but does not specify any additional formatting (for example, 1999/12/31 is as acceptable as 1999-12-31 as 19991231). Thankfully, standard searching is also able to search for most keyboard accessible characters too (including Unicode characters for international users).

The characters ( "?$^*()+|.[] ) cannot be searched for directly, as they have a special meaning in regexes. To search for one of these characters, you must use the escape character () before the character. In order to search for the date 1999.12.31, you would enter the search string 1999.12.31. If searching for the escape character itself, this method still works, so, for 19991231, you would enter the search string 1999\12\31.

The escape character, which changes special characters to work as normal characters, can also be used to make normal characters special. This opens up a number of wildcard characters.

In some search systems (for example, SQL), wildcards are available that match multiple (including zero) characters or single characters (SQL uses % and _, where Microsoft uses * and ?); a regex allows for more granular control over what type of characters are available. A regex can differentiate between digits (d) and non-digits(D), alphanumeric (w) and non-alphanumeric (W), whitespace (for example, spaces, tabs, or line breaks) (s) and non white-space (S), or just any character (.). For each of these types of character (or indeed specific characters), a regex also uses quantifiers, which specify multiple (including zero) characters (*), multiple (not including zero) characters (+), zero or one characters (?), or a specified quantity range of characters ({m,n}).

To match a sequence a number of times, parentheses (()) can be used, followed by the quantifier.

In the following two diagrams, a regex is used to match all events occurring in the first minute of January 13th of any year this millennium. Compare the use of the quantifiers in d{3} for exactly 3 digits, d{0,2} for zero, 1, or 2 digits, and (00:?){2} for the literal string 00:00::

Regex matches examples with wildcards and quantifiers

This regex is used in the following command to filter one large log file into another, filtered file. The greater than sign (>) is used to output the results of the left operation into the right file instead of back to the command line. The -E command is used to enforce an extended regex, which allows for the {m,n} quantifiers. Notice how the regex string is enclosed in double quote marks ("). This is to circumvent any characters that might be special characters in the command line (for example, the parentheses):

Passing the results of grep to another file

Being able to search through a file is very useful, but the key benefit of this work in the command line is the ability to output each line that is found, into a second file for either reference or evidential/archive purposes. This means that Linux can use its log files much more like a database than a straightforward, flat, serial file.

Table of Contents for Using regexes to find normal characters

Create new playlist

Sign In

Sign Up

Table of Contents for
Using regexes to find normal characters