Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Using Patterns to Match Dates or Times

Problem

You need to make sure a string looks like a date or time.

Solution

Use a pattern that matches the type of temporal value you expect. Be sure to consider issues such as how strict to be about delimiters between subparts and the lengths of the subparts.

Discussion

Dates are a validation headache because they come in so many formats. Pattern tests are extremely useful for weeding out illegal values, but often insufficient for full verification: a date might have a number where you expect a month, but if the number is 13, the date isn’t valid. This section introduces some patterns that match a few common date formats. Performing Validity Checking on Date or Time Subparts revisits this topic in more detail and discusses how to combine pattern tests with content verification.

To require values to be dates in ISO (CCYY-MM-DD) format, use this pattern:

/^d{4}-d{2}-d{2}$/

The pattern requires the - character as the delimiter between date parts. To allow either - or / as the delimiter, use a character class between the numeric parts (the slashes are escaped with a backslash to prevent them from being interpreted as the end of the pattern constructor):

/^d{4}[-/]d{2}[-/]d{2}$/

Or you can use a different delimiter around the pattern and avoid the backslashes:

m|^d{4}[-/]d{2}[-/]d{2}$|

To allow any non-digit delimiter (which corresponds to how MySQL operates when it interprets strings as dates), use this pattern:

/^d{4}Dd{2}Dd{2}$/

If you don’t require the full number of digits in each part (to allow leading zeros in values like 03 to be missing, for example), just look for three nonempty digit sequences:

/^d+Dd+Dd+$/

Of course, that pattern is so general that it will also match other values such as U.S. Social Security numbers (which have the format 012-34-5678). To constrain the subpart lengths by requiring two to four digits in the year part and one or two digits in the month and day parts, use this pattern:

/^d{2,4}?Dd{1,2}Dd{1,2}$/

For dates in other formats such as MM-DD-YY or DD-MM-YY, similar patterns apply, but the subparts are arranged in a different order. This pattern matches both of those formats:

/^d{2}-d{2}-d{2}$/

If you need to check the values of individual date parts, use parentheses in the pattern and extract the substrings after a successful match. If you’re expecting dates to be in ISO format, for example, do something like this:

if ($val =~ /^(d{2,4})D(d{1,2})D(d{1,2})$/)
{
  ($year, $month, $day) = ($1, $2, $3);
}

The library file lib/Cookbook_Utils.pm in the recipes distribution contains several of these pattern tests, packaged as function calls. If the date doesn’t match the pattern, they return undef. Otherwise, they return a reference to an array containing the broken-out values for the year, month, and day. This can be useful for performing further checking on the components of the date. For example, is_iso_date() looks for dates that match ISO format. It’s defined as follows:

sub is_iso_date
{
my $s = $_[0];

  return undef unless $s =~ /^(d{2,4})D(d{1,2})D(d{1,2})$/;
  return [ $1, $2, $3 ];  # return year, month, day
}

To use the function, do something like this:

my $ref = is_iso_date ($val);
if (defined ($ref))
{
  # $val matched ISO format pattern;
  # check its subparts using $ref->[0] through $ref->[2]
}
else
{
  # $val didn't match ISO format pattern
}

You’ll often find additional processing necessary with dates, because although date-matching patterns help to weed out values that are syntactically malformed, they don’t assess whether the individual components contain legal values. To do that, some range checking is necessary. That topic is covered later in Performing Validity Checking on Date or Time Subparts.

If you’re willing to skip subpart testing and just want to rewrite the pieces, you can use a substitution. For example, to rewrite values assumed to be in MM-DD-YY format into YY-MM-DD format, do this:

$val =~ s/^(d+)D(d+)D(d+)$/$3-$1-$2/;

Time values are somewhat more orderly than dates, usually being written with hours first and seconds last, with two digits per part:

/^d{2}:d{2}:d{2}$/

To be more lenient, you can allow the hours part to have a single digit, or the seconds part to be missing:

/^d{1,2}:d{2}(:d{2})?$/

You can mark parts of the time with parentheses if you want to range-check the individual parts, or perhaps to reformat the value to include a seconds part of 00 if it happens to be missing. However, this requires some care with the parentheses and the ? characters in the pattern if the seconds part is optional. You want to allow the entire :d{2} at the end of the pattern to be optional, but not to save the : character in $3 if the third time section is present. To accomplish that, use (?: pat ), an alternative grouping notation that doesn’t save the matched substring. Within that notation, use parentheses around the digits to save them. Then $3 will be undef if the seconds part is not present, but will contain the seconds digits otherwise:

if ($val =~ /^(d{1,2}):(d{2})(?::(d{2}))?$/)
{
  my ($hour, $min, $sec) = ($1, $2, $3);
  $sec = "00" if !defined ($sec); # seconds missing; use 00
  $val = "$hour:$min:$sec";
}

To rewrite times from 12-hour format with AM and PM suffixes to 24-hour format, you can do something like this:

if ($val =~ /^(d{1,2}):(d{2})(?::(d{2}))?s*(AM|PM)?$/i)
{
  my ($hour, $min, $sec) = ($1, $2, $3);
  # supply missing seconds
  $sec = "00" unless defined ($sec);
  if ($hour == 12 && (!defined ($4) || uc ($4) eq "AM"))
  {
    $hour = "00"; # 12:xx:xx AM times are 00:xx:xx
  }
  elsif ($hour < 12 && defined ($4) && uc ($4) eq "PM")
  {
    $hour += 12;  # PM times other than 12:xx:xx
  }
  $val = "$hour:$min:$sec";
}

The time parts are placed into $1, $2, and $3, with $3 set to undef if the seconds part is missing. The suffix goes into $4 if it’s present. If the suffix is AM or missing (undef), the value is interpreted as an AM time. If the suffix is PM, the value is interpreted as a PM time.

What is the basic date format? Dates come in several common styles, such as ISO (CCYY-MM-DD), U.S. (MM-DD-YY), and British (DD-MM-YY) formats. And these are just some of the more standard formats. Many more are possible. For example, a datafile may contain dates written as June 17, 1959 or as 17 Jun '59.
Are trailing times allowed on dates or perhaps required? When times are expected, is the full time required or just the hour and minute?
Do you allow special values like now or today?
Are date parts required to be delimited by a certain character, such as - or /, or are other delimiters allowed?
Are date parts required to have a specific number of digits? Or are leading zeros on month and year values allowed to be missing?
Are months written numerically, or are they represented as month names like January or Jan?
Are two-digit year values allowed? Should they be converted to have four digits? If so, what is the conversion rule? (What is the transition point within the range 00 to 99 at which values change from one century to another?)
Should date parts be checked to ensure their validity? Patterns can recognize strings that look like dates or times, but while they’re extremely useful for detecting malformed values, they may not be sufficient. A value like 1947-15-99 may match a pattern but isn’t a legal date. Pattern testing is thus most useful in conjunction with range checks on the individual parts of the date.

The prevalence of these issues in data-transfer problems means that you’ll probably end up writing some of your own validators on occasion to handle very specific date formats. Later sections of this chapter can provide additional assistance. For example, Converting Two-Digit Year Values to Four-Digit Form covers conversion of two-digit year values to four-digit form, and Performing Validity Checking on Date or Time Subparts discusses how to perform validity checking on components of date or time values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
Using Patterns to Match Dates or Times

Using Patterns to Match Dates or Times

Problem

Solution

Discussion

See Also

Table of Contents for Using Patterns to Match Dates or Times

Create new playlist

Sign In

Sign Up

Using Patterns to Match Dates or Times

Problem

Solution

Discussion

See Also

Table of Contents for
Using Patterns to Match Dates or Times