Chapter 9. Regular Expressions

Regular Expressions (regex) are somewhat of a mystery for many developers. We admit that they are something that we use often enough to warrant a deeper understanding of how they work. On the flip side, there are so many tried and tested regex patterns on the Internet that just reusing one that already exists is most times easier than trying to create one yourself. The subject of regex is much larger than what can be explained in a single chapter in this book.

Therefore, in this chapter, we will merely introduce some of the concepts of regex. For a deeper understanding of regex, further study is needed. For the purpose of this book, however, we will take a closer look at how regex are created and how they can be applied to some common programming problems. In this chapter, we will cover the following recipes:

  • Getting started with regex
  • Matching a valid date
  • Sanitizing input
  • Dynamic regex matching

Introduction

A regex is a pattern that describes a string through the use of special characters that denote a specific bit of text to match. The use of regex is not a new concept in programming. For regex to work, they need to use a regex engine that does all the heavy lifting.

In the .NET Framework, Microsoft has provided for the use of regex. To use regex, you will need to import the System.Text.RegularExpressions assembly to your project. This will allow the compiler to use your regex pattern and apply it to the specific text you need to match.

Secondly, regex have a specific set of metacharacters that hold special meaning to the Regex engine. These characters are [ ], { }, ( ), *, +, , ?, |, $, . and, ^.

The use of the curly brackets { }, for example, enables developers to specify the number of times a specific set of characters need to occur. Using square brackets, on the other hand, defines exactly what needs to be matched.

If we, for example, specified [abc], the pattern would look for lowercase As, Bs, and Cs. Regex, therefore, also allows you to define a range, for example, [a-c], which is interpreted in exactly the same way as the [abc] pattern.

Regex then also allow you to define characters to exclude by using the ^ character. Therefore, typing [^a-c] would find lowercase D through Z because the pattern is telling the regex engine to exclude lowercase As, Bs, and Cs.

Regex also define d and D as types of shortcut for [0-9] and [^0-9], respectively. Therefore, d matches all numeric values, and D matches all non-numeric values. Another shortcut is w and W, which match any character from lowercase A to Z, irrespective of the case, all numeric values from 0 to 9, and the underscore character. Therefore, w is [a-zA-Z0-9_], while W is [^a-zA-Z0-9_].

The basics of regex are rather easy to understand, but there is a lot more that you can do with regex.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.186.109