How it works...

Let's take a closer look at the two expressions used in the previous code example. Comparing them with each other, you can see the change we made in yellow:

Before we get to what that change means, let's break up the expression and view the individual components. Our regex is basically saying that we must match all string dates that start with 19 or 20 and have the following separators:

  • Dash (-)
  • Decimal (.)
  • Forward slash (/)

To understand the expression better, we need to understand the following format of the expression <Valid Years><Valid Separators><Valid Months><Valid Separators><Valid Days>.

We also need to be able to tell the regex engine to consider one OR another pattern. The word OR is symbolized by the | metacharacter. To make the regex engine consider the word OR without splitting up the whole expression, we wrap it in the parenthesis ().

Here are the symbols used in the regex:

The conditional OR Description
| This denotes the OR metacharacter.
The year portion Description
(19|20) Only allow 19 or 20
dd Matches two single digits between 0 and 9. To match only one digit between 0 and 9, you would use d.
The valid separator character set Description
[-./] Matches any of the following characters in the character set. These are our valid separators. To match a space date separator, you would change this to [- ./], where you add a space anywhere in the character set. We added the space between the dash and the decimal.
Valid digits for months and days Description
0[1-9] Matches any part starting with zero followed by any digit between 1 and 9. This will match 01, 02, 03, 04, 05, 06, 07, 08, and 09.
1[0-2] Matches any part starting with 1 followed by any digit between 0 and 2. This will match 10, 11, or 12.
[1-9] Matches any digit between 1 and 9.
[12][0-9] Matches any part starting with 1 or 2 followed by any digit between 0 and 9. This will match all number strings between 10 and 29.
3[01] Matches any part starting with 3 and followed by 0 or 1. This will match 30 or 31.
Start and end of string Description
^ Tells the regex engine to start at the beginning of the given string to match.
$ Tells the regex engine to stop at the end of the given string to match.

The first regex we created, interprets as follows:

  • ^: Start at the beginning of the string to match
  • (19|20): Check whether the string starts with 19 or 20
  • dd: After the check, follows two single digits between 0 and 9
  • [-./]: The year portion ends followed by a date separators
  • (0[1-9]|1[0-2]): Find the month logic by looking for digits starting with 0, followed by any digit between 1 and 9, OR digits starting with 1, followed by any digit between 0 and 2
  • [-./]: The month logic ends followed by a date separator
  • (0[1-9]|[12][0-9]|3[01]): Then, find the day logic by looking for digits starting with 0, followed by a digit between 1 and 9, OR digits starting with 1 or 2, followed by any digit between 0 and 9, OR a digit matching 3, followed by any digit between 0 and 1
  • $: Do this until the end of the string

Our first regex was incorrect because our month logic was incorrect. Our month logic dictates to find the month logic by looking for digits starting with a 0 followed by any digit between 1 and 9 or digits starting with a 1 followed by any digit between 0 and 2 (0[1-9]|1[0-2]).

This will then find 01, 02, 03, 04, 05, 06, 07, 08, 09, or 10, 11, 12. The date that it didn't match was 2016.4.10 (the date separators don't make a difference here). This is because our month came through as a single digit, and we were looking for months where the single digits started with a zero. To fix this, we had to modify the expression of the month logic to include single digits only between 1 and 9. We did this by adding [1-9] to the expression at the end.

The modified regex then read as follows:

  • ^: Start at the beginning of the string to match
  • (19|20): Check whether the string starts with 19 or 20
  • dd: After the check, follows two single digits between 0 and 9
  • [-./]: The year portion ends followed by a date separator
  • (0[1-9]|1[0-2]): Find the month logic by looking for digits starting with 0, followed by any digit between 1 and 9, OR digits starting with 1, followed by any digit between 0 and 2 or any single digits between 1 and 9
  • [-./]: The month logic ends followed by a date separator
  • (0[1-9]|[12][0-9]|3[01]): Then, find the day logic by looking for digits starting with 0, followed by a digit between 1 and 9, OR digits starting with 1 or 2, followed by any digit between 0 and 9, OR a digit matching 3, followed by any digit between 0 and 1
  • $: Do this until the end of the string

This is a basic regex, and we say basic because there is a lot more we can do to make the expression better. We can include logic to consider alternative date formats such as mm-dd-yyyy or dd-mm-yyyy. We can add logic to check February and validate that it contains only 28 days, unless it is a leap year, in which case we need to allow the twenty-ninth day of February. Furthermore, we can also extend the regex to check that January, March, May, July, August, October, and December have 31 days while April, June, September, and November contain only 30 days.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.114.19