Chapter 12

Temporal Data Types in SQL

Abstract

SQL is the first programming language to have temporal data types in it. The problem is that our calendar system is irregular.

Keywords

Y2K problem

Common Era Calendar

ISO-8601

Universal Coordinated Time

YEAR

MONTH

DAY

HOUR

MINUTE

SECOND

Timestamp

Calendar date

Ordinal date

Week date

USNO

NIST

ITS

ACTS

Time zones

CURRENT_TIMESTAMP

LOCALTIMESTAMP

INTERVAL data types

Date arithmetic

OVERLAPS() predicate

History tables

State-transition constraints

Calendar tables

Report period tables

Clifford Simak wrote a science fiction novel entitled Time is the Simplest Thing in 1977. He was wrong. And the problems did not start with the Y2K problems we had in 2000, either. The calendar is irregular and the only standard unit of time is the second, years, months, weeks, hours, minutes, and so forth are not part of the metric (SI units) system but are mentioned in the ISO standards as conventions.

SQL is the first programming language to have temporal data types in it. If COBOL had done this, we would never have had the “Y2K Crisis” in IT. SQL-92 added temporal data to the language, acknowledging what was already in most SQL product by that time. The problem is that each vendor made a trade-off internally. We will get into SQL code later, but since this is an area where people do not have a good understanding, it is better to start with foundations.

12.1 Notes on Calendar Standards

The current calendar is known as the Common Era Calendar, and not the Western, Christian, or Gregorian Calendar. We want to use a universal, nonethnic, nonreligious name for it. The abbreviations for postfixes on dates are CE and BCE for “Common Era” and “Before Common Era,” respectively. The abbreviations A.D. (Anno Domini—Latin for “in the year of Our Lord”) and B.C. (“Before Christ”) were dropped to avoid religious references.

Unfortunately, the solar year is not an even number of days; there are 365.2422 days in a year and the fraction adds up over time. This is why we have leap year in the Common Era Calendar. Leap years did not exist in the Roman or Egyptian solar calendars prior to the year 708 AUC (“ab urbe condita,” Latin for “from the founding of the City [Rome]”). As a result, they were useless for agriculture, so the Egyptians relied on the stars to predict the flooding of the Nile. To realign the calendar with the seasons, Julius Caesar decreed that the year 708 (that is, the year 46 BCE to us) would have 445 days. Caesar, on the advice of Sosigenes, also introduced leap years (known as bissextile years) at this time. Many Romans simply referred to 708 AUC as the “year of confusion” and thus began the Julian calendar that was the standard for the world from that point forward.

The Julian calendar had a leap year day every 4 years and was reasonably accurate in the short or medium range, but it drifted by approximately 3 days every 400 years. This is a result of the 0.0022 fraction of a day adding up.

It had gotten 10 days out of step with the seasons by 1582 (a calendar without a leap year would have drifted completely around slightly more than once between 708 AUC and 2335 AUC—that is, 1582 CE to us). The Summer Solstice, so important to planting crops, had no relationship to 21 June. Scientists finally convinced Pope Gregory to realign the calendar by dropping almost 2 weeks from the month of October in 1582 CE. The years 800 CE and 1200 CE were leap years anywhere in the Christian world. But whether 1600 CE was a leap year depended on where you lived. European countries did not move to the new calendar at the same time or follow the same pattern of adoption.

The calendar corrections had economic and social ramifications. In Great Britain and its colonies, September 02, 1752 was followed by September 14, 1752. The calendar reform bill of 1751 was entitled “An Act for Regulating the Commencement of the Year and for Correcting the Calendar Now in Use.” The bill included provisions to adjust the amount of money owed or collected from rents, leases, mortgages, and similar legal arrangements so that rents and so forth were prorated by the number of actual elapsed days in the time period affected by the calendar change. Nobody had to pay the full monthly rate for the short month of September in 1752 and nobody had to pay the full yearly rate for the short year.

The serious, widespread, and persistent rioting was not due to the commercial problems that resulted, but to the common belief that each person’s days were “numbered” and that everyone was preordained to be born and die at a divinely ordained time that no human agency could alter in any way.

Thus the removal of 11 days from the month of September shortened the lives of everyone on Earth by 11 days. And there was also the matter of the missing 83 days due to the change of the New Year’s Day from March 25 to January 01, which was believed to have a similar effect.

If you think this behavior is insane, consider the number of people today who get upset about the yearly 1-hour clock adjustments for Daylight Saving Time.

To complicate matters, the beginning of the year also varied from country to country. Great Britain preferred to begin the year on March 25, whereas other countries began at Easter, December 25, or perhaps March 01 and January 01—all important details for historians to keep in mind.

In Great Britain and its colonies, the calendar year 1750 began on March 25 and ended on March 25—that is, the day after March 24, 1750 and March 25, 1751. The leap year day was added to the end of the last full month in the year, which was then February. The extra leap year day comes at the end of February, as this part of the calendar structure was not changed.

In Latin, “septem” means seventh, from which we derived September. Likewise, “octem” means eighth, “novem” means ninth, and “decem” means tenth. Thus, September should be the seventh month, October should be the eighth, November should be the ninth, and December should be the tenth.

So, how come September is the ninth month? September was the seventh month until 1752 when the New Year was changed from March 25 to January 01.

Until fairly recently, nobody agreed on the proper display format for dates. Every nation seems to have its own commercial conventions. Most of us know that Americans put the month before the day and the British do the reverse, but do you know any other national conventions? National date formats may be confusing when used in an international environment. When it was ‘12/16/95’ in Boston, it was ‘16/12/95’ in London, ‘16.12.95’ in Berlin and ‘95-12-16’ in Stockholm. Then there are conventions within industries within each country that complicate matters further.

At one time, NATO tried to use Roman numerals for the month to avoid language problems among treaty members. The United States Army did a study and found that the four-digit year, three-letter month, and two-digit day format was the least likely to be missorted, misread, or miswritten by English speakers. That is also the reason for “24 hour” or “military” time.

Today, we have a standard for this: ISO-8601 “Data Elements and Interchange Formats—Information Interchange—Representation of Dates and Times” that is part of Standard SQL and other ISO standards.

The full ISO-8601 timestamp can be either a local time or a UTC time. UTC is the code for “Universal Coordinated Time,” which replaced the older GMT, which was the code for “Greenwich Mean Time,” which is still improperly used in popular media.

In 1970, the Coordinated Universal Time system was devised by an international advisory group of technical experts within the International Telecommunication Union (ITU). The ITU felt it was best to designate a single abbreviation for use in all languages in order to minimize confusion. The two alternative original abbreviation proposals for the “Universal Coordinated Time” were CUT (English: Coordinated Universal Time) and TUC (French: Temps Universel Coordinne). UTC was selected both as a compromise between the French and English proposals because the C at the end looks more like an index in UT0, UT1, UT2 and a mathematical-style notation is always the most international approach.

Universal Coordinated Time is not quite the same thing as astronomical time. The Earth wobbles a bit and the UTC had to be adjusted to the solar year with a leap second added or removed once a year to keep them in synch. As of this writing, Universal Coordinated Time will be based on an atomic clock without a leap second adjustment.

This extra second has screwed up software. In 1998, the leap second caused a mobile-phone blackout across the southern United States because different regions were suddenly operating with time differences outside the error tolerances. Then in 2012, an airline’s booking system went belly-up for hours after a leap second insertion. Most nations want to move to an atomic clock, but not all. Britain wants to keep the leap second. Most countries are happy to let the clocks drift away from “solar time.” The reason for Britain’s reticence is largely a Luddite reaction to change. In 2014, the UK government launched a public opinion poll on the issue. As of this writing, the outcome is not known, so you will have to Google it.

Another problem is the use of local time zones (four of them in the United States) and “lawful time” to worry about. This is the technical term for time required by law for commerce. Usually, this means whether or not you use Daylight Saving Time (DST) and how it is defined locally. A date without a time zone is ambiguous in a distributed system. A transaction created with DATE ‘1995-12-17’ in London may be younger than a transaction created with DATE ‘1995-12-16’ in Boston.

12.2 The Nature of Temporal Data Models

There is a joke about a father showing his young son, who has only seen digital clocks and watches, an old pocket watch with a sweeping second hand.

“What is it, Daddy?”

“It’s a watch! This is how we used to tell time.”

“HOW!? It is always changing!”

Time is not a simple thing. Most data processing done with data that is discrete by its nature. An account number is or is not equal to a value. A measurement has a value to so many decimal places. But time is a continuum, which means that given any two values on the time line, you can find an infinite number of points between them. Then we have the problem of which kind of infinite. Most nonmath majors do not even know that some transfinite numbers are bigger than others.

Do not panic. For purposes of a database, the rule we need to remember is that “Nothing happens instantaneously” in the real world. Einstein declared that duration in time is the fourth dimension that everything must have to exist. But before Einstein, the Greek philosopher Zeno of Elea (circa 490 to 430 BCE) wrote several paradoxes, but the one that will illustrate the point about a continuum versus a discrete set of points is the Arrow Paradox.

Informally, imagine you shoot an arrow into the air. It moves continuously from your bow to the target in some finite amount of time. Look at any instant in that period of time. The arrow cannot be moving during that instant because an instant has no duration as your arrow cannot be in two different places at the same time. Therefore, at every instant in time, the arrow is motionless. If this is true for all instants of time, then the arrow is motionless during the entire interval. The fallacy is that there is no such thing as an instant in time. But the Greeks only had geometry and the ideas of the continuum had to wait for Calculus. If you want more details on the topic, get a copy of A TOUR OF THE CALCULUS by David Berlinski (ISBN10: 0-679-74788-5) that traces the historical development of calculus from Zeno (about 450 BC) to Cauchy in the 19th Century.

The ISO model for temporal values is based on half-open intervals. This means there is a starting point, but the interval never gets to the ending point. For example, the day begins at ‘2016-01-01 00:00:00’ exactly, but it does not end at ‘2016-01-02 00:00:00’; instead, it approaches the start of the next day as limit. Depending on how much decimal precision we have, ‘2016-01-01 23:59:59.999..’ is the end point approximation. I just need to have at least one more fractional part than my data (Figure 12.1).

f12-01-9780128007617
Figure 12.1 Abutted Half-Open Time Intervals.

Half-open intervals can be abutted to each other to produce another half-open interval. Two overlapping half-open intervals produce a half-open interval. Likewise, if you remove a half-open interval from another half-open interval, you get one or two half-open intervals. This is called closure and it is a nice mathematical property to have.

12.3 SQL Temporal Data Types

Standard SQL has a very complete description of its temporal data types. There are rules for converting from numeric and character strings into these data types, and there is a schema table for global time-zone information that is used to make sure that temporal data types are synchronized. It is so complete and elaborate that smaller SQLs have not implemented it yet. As an international standard, SQL has to handle time for the whole world and most of us work with only local time. If you have ever tried to figure out the time in a foreign city to place a telephone call, you have some idea of what is involved.

The common terms and conventions related to time are also confusing. We talk about “an hour” and use the term to mean a particular point within the cycle of a day (The train arrives at 13:00 Hrs) or to mean an interval of time not connected to another unit of measurement (The train takes three hours to get there), the number of days in a month is not uniform, the number of days in a year is not uniform, weeks are not easily related to months, and so on.

Standard SQL has a set of date, time (DATE, TIME, and TIMESTAMP), and INTERVALs (DAY, HOUR, MINUTE, and SECOND with decimal fraction) data types. They are made up of fields which are ordered within the value; they are YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. This is the only place in SQL that we use the term “field”; new SQL programmers too often confuse the term field as used in file systems with the column concept in RDBMS. They are very different.

Both of these are temporal data types, but datetimes represent points in the time line, while the interval data types are durations of time, not anchored at a point on the timeline. Standard SQL also has a full set of operators for these data types. But you will still find vendor syntax in most SQL implementations today.

12.3.1 Tips for Handling Dates, Timestamps, and Times

The syntax and power of date, timestamp, and time features vary so much from product to product that it is impossible to give anything but general advice. This chapter will assume that you have simple date arithmetic in your SQL, but you might find that some library functions would let you do a better job than what you see here. Please continue to check your manuals until the Standard SQL operators are implemented.

As a general statement, there are two ways of representing temporal data internally. The “UNIX representation” is based on keeping a single binary string of 64 or more bits that counts the computer clock ticks from a base starting date and time. The other representation is I will call the “COBOL method,” as it uses separate fields for the year, month, day, hours, minutes, and seconds. These fields can be characters, BCD, or other another internal format.

The UNIX method is very good for calculations, but the engine must convert from the external ISO-8601 format to the internal format and vice versa. The COBOL format is the opposite, good for display purposes, but weaker on calculations.

12.3.2 Date Format Standards

There are three basic display formats in the ISO standards. They are all digits separated by punctuation of some kind.

Calendar date: It is a string of digits that is made up of the four-digit year, a dash, two-digit month, dash, and a two-digit day within the month. Example: ‘2015-06-25’ for June 25, 2015. This is the only date display format allowed in ANSI/ISO Standard SQL. However, The ISO-8601 standards allow you to drop the dashes and write the data as a string of all digits.

Ordinal date: It is a string of digits that is made up of the four-digit year, a dash, and the three-digit ordinal number of the day within the year expressed as ‘001’ through ‘365’ or ‘366’ as appropriate. Example: ‘2015-176’ for June 25, 2015.

Week date: It is a string of digits that is made up of the four-digit year, a ‘W,’ the two-digit ordinal number of the week within the year expressed as ‘01’ through ‘52’ or ‘53’ as appropriate, a dash, and a single digit from 1 to 7 for day within the week (1 = Monday, 7 = Sunday). Very often the week day is not used. Example: ‘2015W26’ is the week from ‘2015-06-22’ to ‘2015-06-28,’ which includes ‘2015W26-05’ for June 25, 2015.

Weeks do not align to calendar dates, so a week can cross over year boundaries. The first week of the year is the week that contains that year’s first Thursday (first four-day week of the year). The highest week number in a year is either 52 or 53.

Time Periods: Longer periods can be modeled by truncating the finer fields in these display formats. Thus a whole year can be shown with four digits, the year-month period with “yyyy-mm,” and we are also ready to show a whole week. None of this is part of SQL but the concept is useful, and you may have to invent your own names for such things in look-up tables.

I like the MySQL convention of using double zeroes for months and years, That is, ‘yyyy-mm-00’ for a month within a year and ‘yyyy-00-00’ for the whole year. The advantages are that it will sort with the ISO-8601 data format required by Standard SQL and it is language independent. The regular expression patterns for validation are ‘[12][0-9][0-9][0-9]-00-00’ and ‘[12][0-9][0-9][0-9]-[01][0-9]-00,’ respectively. You will need to create a look-up table with the time period name and the start and end timestamps for it.

12.3.3 Time Format Standards

TIME(n) is made up of a two-digit hour between ‘00’ and ‘23,’ colon, a two-digit minute between ‘00’ and ‘59,’ colon, and a two-digit second between ‘00’ and ‘59’ or ‘60,’ if the leap second is still in use. Seconds can also have decimal places shown by (n) from zero to an implementation-defined accuracy. The FIPS-127 standard requires at least five decimal places after the second and modern products typically go to seven decimal places.

We do not use the old AM and PM postfixes any ISO Standards. There is no such time as 24:00:00; this 00:00:00 of the next day. However, some SQLs will accept 24:00:00 as input and put it in the proper format. These values make little sense outside of a timestamp.

TIMESTAMP(n) values is made up of a date, a space, and a time. The ISO standards allow the space to be replaced by the letter ‘T’ to put the timestamp into a single string and for the punctuation to be removed. The SQL standard does not.

Remember that a TIMESTAMP will read the system clock once and use that same time on all the items involved in a transaction. It does not matter if the actual time it took to complete the transaction was days; a transaction in SQL is done as a whole unit or is not done at all. This is not usually a problem for small transactions, but it can be in large batched ones where very complex updates have to be done.

TIMESTAMP as a source of unique identifiers is fine in many single-user systems, as all transactions are serialized and of short enough duration that the clock will change between transactions-peripherals are slower than CPUs. But in a client/server system, two transactions can occur at the same time on different local workstations. Using the local client machine clock can create duplicates and adds the problem of coordinating all the clients. The coordination problem has two parts:

1. How do you get the clocks to start at the same time? I do not mean just the technical problem of synchronizing multiple machines to the microsecond but also the one or two clients who forgot about Daylight Saving Time.

2. How do you make sure the clocks stay the same? Using the server clock to send a timestamp back to the client increases network traffic yet does not always solve the problem.

The modern solution is to use the NIST time signal to set and synchronize all clocks, not just those in computers. Official U.S. Government time, as provided by NIST and USNO (U.S. Naval Observatory), is available on the Internet at http://www.time.gov. NIST also offers an Internet Time Service (ITS) and an Automated Computer Time Service (ACTS) that allow setting of computer and other clocks through the Internet or over standard commercial telephone lines. Free software for using these services on several types of popular computers can be downloaded there. The NIST Website has information on time and frequency standards and research.

Many operating systems represent the system time as a long binary string based on a count of machine cycles since a starting date. One trick is to pull off the least significant digits of this number and use them as a key. But this will not work as transaction volume increases. Adding more decimal places to the timestamp is not a solution either. The real problem lies in statistics.

Open a telephone book (white pages) at random. Mark the last two digits of any 13 consecutive numbers, which will give you a sample of numbers between 00 and 99. What are the odds that you will have a pair of identical numbers? It is not 1 in 100, as you might first think. Start with one number and add a second number to the set; the odds that the second number does not match the first are 99/100. Add a third number to the set; the odds that it matches neither the first nor the second number are 98/100. Continue this line of reasoning and compute (0.99 * 0.98 * entity * 0.88) = 0.4427 as the odds of not finding a pair. Therefore, the odds that you will find a pair are 0.5572, a bit better than even. By the time you get to 20 numbers, the odds of a match are about 87%; at 30 numbers, the odds exceed a 99% probability of one match. You might want to carry out this model for finding a pair in three-digit numbers and see when you pass the 50% mark.

A good key generator needs to eliminate (or at least minimize) identical keys and give a statistical distribution that is fairly uniform to avoid excessive index reorganization problems. Most key-generator algorithms that use the system clock depend on one or more “near key” values, such as employee name, to create a unique identifier.

The mathematics of such algorithms is much like that of a hashing algorithm. Hashing algorithms also try to obtain a uniform distribution of unique values. The difference is that a hashing algorithm must ensure that a hash result is both unique (after collision resolution) and repeatable so that it can find the stored data. A key generator needs only to ensure that the resulting key is unique in the database, which is why it can use the system clock and a hashing algorithm cannot.

You can often use a random-number generator in the host language to create pseudo-random numbers to insert into the database for these purposes. Most pseudo-random number generators will start with an initial value, called a seed, and then use it to create a sequence of numbers. Each call will return the next value in the sequence to the calling program. The sequence will have some of the statistical properties of a real random sequence, but the same seed will produce the same sequence each time, which is why the numbers are called pseudo-random numbers. This also means that if the sequence ever repeats a number it will begin to cycle. (This is not usually a problem, as the size of the cycle can be hundreds of thousands or even millions of numbers.)

12.3.4 Basic Time

You should use a “24-hour” time format, which is less prone to errors than 12-hour (AM/PM) time, as it is less likely to be misread or miswritten. This format can be manually sorted more easily and is less prone to computational errors. Americans use a colon as a field separator between hours, minutes, and seconds; some Europeans use a period. This is not a problem for them, as they also use a comma for a decimal point.

One of the major problems with time is that there are three kinds: fixed events (He arrives at 13:00 Hrs), intervals (The trip takes three hours), and durations (The train leaves at 10:00 Hrs and arrives at 13:00 Hrs)—which are all interrelated. An INTERVAL is a unit of duration of time rather than a fixed point in time. This comes from the fact that time is an interval scale.

12.3.5 Time Zones

Older, smaller databases live and work in one time zone. The system clock is set to local time and the DBA ignores the complications like leap seconds, DST, and time zones. Standard SQL uses only UTC and converts it to local time with TIMEZONE_HOUR and TIMEZONE_MINUTE fields at the end. These fields give the time-zone displacement for local times in that column or temporal variable.

There are also three-letter and four-letter codes for the time zones of the world, such as EST, for Eastern Standard Time, in the United States. But these codes are not universal. For example, all of these time zones are UTC—3 hours

Time-Zone CodeTime-Zone NameWhere Used
ADTAtlantic Daylight TimeAtlantic
BRTBrasília TimeSouth America
CLSTChile Summer TimeSouth America
GFTFrench Guiana TimeSouth America
WGTWest Greenland TimeNorth America

The closest thing to a universal naming convention for time zones is the Military alphabet code.

Time-Zone CodeTime-Zone NameDisplacement from UTC
AAlpha Time Zone+ 1 hour
BBravo Time Zone+ 2 hours
CCharlie Time Zone+ 3 hours
DDelta Time Zone+ 4 hours
EEcho Time Zone+ 5 hours
FFoxtrot Time Zone+ 6 hours
GGolf Time Zone+ 7 hours
HHotel Time Zone+ 8 hours
IIndia Time Zone+ 9 hours
KKilo Time Zone+ 10 hours
LLima Time Zone+ 11 hours
MMike Time Zone+ 12 hours
NNovember Time Zone− 1 hour
OOscar Time Zone− 2 hours
PPapa Time Zone− 3 hours
QQuebec Time Zone− 4 hours
RRomeo Time Zone− 5 hours
SSierra Time Zone− 6 hours
TTango Time Zone− 7 hours
UUniform Time Zone− 8 hours
VVictor Time Zone− 9 hours
WWhiskey Time Zone− 10 hours
XX-ray Time Zone− 11 hours
YYankee Time Zone− 12 hours
ZZulu Time ZoneUTC

This is why UTC is sometimes called “Zulu time” and the letter Z is used as punctuation between the timestamp and the displacement in ISO-8601.

The offset is usually a positive or negative number of hours, but there are still a few odd zones that differ by 15 or 30 minutes from the expected pattern.

The TIMESTAMP data type is a DATE and a TIME put together in one values (e.g., ‘2017-05-03 05:30:06.123’). There are some variations from DBMS to DBMS though. For example, the time component of DB2 TIMESTAMP data is configurable and can be more precise than DB2 TIME data. This is what CURRENT_TIMESTAMP returns from the system clock in a program, query, or statement. However, SQL dialects will still use NOW, getdate(), and other proprietary reserved words.

TIMESTAMP WITH TIME ZONE or TIMESTAMPTZ data types are a TIMESTAMP but include the time zone’s displacement from UTC. The standards allow for TZD (time-zone designator), ‘Z’ or a positive or negative hour to minute interval (+ hh:mm or − hh:mm). Standard SQL uses the last option.

CURRENT_TIMESTAMP is a representation of the current date and time with time zone. LOCALTIMESTAMP is a representation of the current date and time but without a time zone.

Now you have to factor in Daylight Saving Time on top of that to get what is called “lawful time” which is the basis for legal agreements. The U.S. government uses DST on federal lands inside of states that do not use DST. You can get a mix of gaps and duplicate times in the local lawful time display over a year. This is why Standard SQL uses UTC internally.

Vendors often have a system configuration parameter to set the local time zone and other options. You need to know your SQL and not get caught in this.

12.4 INTERVAL Data Types

INTERVAL data types are used to represent temporal duration. They come in two basic types, intervals that deal with the calendar and those that deal with the clock. The year-month intervals have an express or implied precision that includes no fields other than YEAR and MONTH, though it is not necessary to use both. The other class, called day-time intervals, has an express or implied interval precision that can include any fields other than YEAR or MONTH—that is, DAY, HOUR, MINUTE, and SECOND (with decimal places).

The units of time in an SQL temporal value are called fields; do not confuse this with the term “fields” are use with non_RDBMS file systems. The fields in the interval have to be in high to low order without missing fields.

FieldInclusive Value Limit
YEAR'0001' to '9999'; follows the ISO-8601 Standard
MONTH'01' to '12'; the value 12 might be rounded to 1 year
DAY'01' to '31'; must be valid for month and year
HOUR'00' to '23'; value 24 might be rounded to the day
MINUTE'00' to '59'; watch for leap seconds!
SECOND'00' to '59.999..'; precision is implementation defined

SECOND are integers and have precision 2 when not the first field. SECOND, however, can be defined to have an < interval fractional seconds precision > that indicates the number of decimal digits maintained following the decimal point in the seconds value. When not the first field, SECOND has a precision of two places before the decimal point.

The datetime literals are not surprising in that they follow the syntax used by ISO-8601 Standards with the dashes between the fields in dates and colons between the field times. The strings are always quoted. The interval qualifier follows the keyword INTERVAL when specifying an INTERVAL data type.

The following table lists the valid interval qualifiers for YEAR-MONTH intervals:

Interval QualifierDescription
YEARAn interval class describing a number of years
MONTHAn interval class describing a number of months
YEAR TO MONTHAn interval class describing a number of years and months

The following table lists the valid interval qualifiers for DAY-TIME intervals:

Interval QualifierDescription
DAYPlus or minus a number of days
HOURPlus or minus a number of hours
MINUTEPlus or minus a number of minutes
SECOND(s)Plus or minus a number of seconds (decimals are allowed)
DAY TO HOURPlus or minus a number of days and hours
DAY TO MINUTEPlus or minus a number of days, hours, and minutes
DAY TO SECOND(s)Plus or minus a number of days, hours, minutes, and seconds
HOUR TO MINUTEPlus or minus a number of hours and minutes
HOUR TO SECOND(s)Plus or minus a number of hours, minutes, and seconds
MINUTE TO SECOND(s)Plus or minus a number of minutes and seconds

Here is a sample query that shows all of the INTERVAL types in use:

SELECT CURRENT_TIMESTAMP + INTERVAL '+7' YEAR,
 CURRENT_TIMESTAMP + INTERVAL '-3' MONTH,
 CURRENT_TIMESTAMP + INTERVAL '0007 03' YEAR TO MONTH,
 CURRENT_TIMESTAMP + INTERVAL '+5' DAY,
 CURRENT_TIMESTAMP + INTERVAL '-5' HOUR,
 CURRENT_TIMESTAMP + INTERVAL '12' MINUTE,
 CURRENT_TIMESTAMP + INTERVAL '3' SECOND,
 CURRENT_TIMESTAMP + INTERVAL '1 12' DAY TO HOUR,
 CURRENT_TIMESTAMP + INTERVAL '1 12:35' DAY TO MINUTE,
 CURRENT_TIMESTAMP + INTERVAL '1 12:35:45' DAY TO SECOND,
 CURRENT_TIMESTAMP + INTERVAL '01:12' HOUR TO MINUTE,
 CURRENT_TIMESTAMP + INTERVAL '01:12:35' HOUR TO SECOND,
 CURRENT_TIMESTAMP + INTERVAL '01:12' MINUTE TO SECOND
FROM Dummy;

Notice that the quoted strings in the HOUR TO MINUTE and MINUTE TO SECOND example are the same but have different meanings. A timestamp literal can also include a time-zone interval to change it from a UTC time to a local time.

12.5 Queries with Date Arithmetic

Almost every SQL implementation has a DATE data type, but the proprietary functions available for them vary quite a bit. The most common ones are a constructor that builds a date from integers or strings; extractors to pull out the month, day, or year; and some display options to format output.

You can assume that your SQL implementation has simple date arithmetic functions, although with different syntax from product to product, such as

1. A date plus or minus a number of days yields a new date.

2. A date minus a second date yields an integer number of days between the dates.

Here is a table of the valid combinations of < datetime > and < interval > data types in the Standard SQL standard:

< datetime > − < datetime > = < interval >

< datetime > + < interval > = < datetime >

< interval > (* or/) < numeric > = < interval >

< interval > + < datetime > = < datetime >

< interval > + < interval > = < interval >

< numeric > * < interval > = < interval >

There are other rules, which deal with time zones and the relative precision of the two operands that are intuitively obvious.

The Standard CURRENT_DATE function that returns the current date from the system clock. However, you will still find vendor dialects with odd names for the same function, such as TODAY, SYSDATE, Now(), and getdate(). There may also be a function to return the day of the week from a date, which is sometimes called DOW() or WEEKDAY(). Standard SQL provides for CURRENT_DATE, CURRENT_TIME [(< time precision >)], CURRENT_TIMESTAMP [(< timestamp precision >)], and LOCALTIMESTAMP functions, which are self-explanatory.

12.6 Use of NULL for “Eternity”

The temporal model in SQL does not have a symbol for “eternity in the future” or “eternity in the past,” so you have to work around it for some applications. The IEEE floating point standard does have both a “−inf” and “+inf” symbol to handle this problem in the continuum model for real numbers. In fact, SQL can “only” represent timestamps in the range of years from 0001 CE up to 9999 CE. Usually, this range is good enough for most applications outside of archeology.

For example, when someone checks into a hotel, we know their arrival data, but we not know their departure date (an expected departure date is not the same thing as an actual one). All we know for certain is that it has to be after their arrival date. A NULL will act as a “place holder” until we get the actual departure date. The skeleton DDL for such a table would look like this:

CREATE TABLE Hotel_Register
(patron_id INTEGER NOT NULL
REFERENCES Patrons (patron_id),
  arrival_date TIMESTAMP(0) DEFAULT CURRENT_TIMESTAMP NOT NULL,
departure_date TIMESTAMP(0), -- null means guest still here
CONSTRAINT arrive_before_depart
CHECK (arrival_date <= departure_date),
..);

When getting reports, you will need to use the current timestamp in place of the NULL to accurately report the current billing.

 SELECT patron_id, arrival_date,
COALESCE (CURRENT_TIMESTAMP, departure_date)
AS departure_date
FROM HotelRegister
 WHERE ..;

12.7 The OVERLAPS() Predicate

The OVERLAPS() predicate is a feature still not available in most SQL implementations because it requires more of the Standard SQL temporal data features than most implementations have. You can “fake it” in many products with the BETWEEN predicate and careful use of constraints.

The result of the < OVERLAPS predicate > is formally defined as the result of the following expression:

 (S1 > S2 AND NOT (S1 >= T2 AND T1 >= T2))
 OR (S2 > S1 AND NOT (S2 >= T1 AND T2 >= T1))
 OR (S1 = S2 AND (T1 <> T2 OR T1 = T2))

where S1 and S2 are the starting times of the two time periods and T1 and T2 are their termination times. The rules for the OVERLAPS() predicate sound like they should be intuitive, but they are not. The principles that we wanted in the Standard were:

1. A time period includes its starting point but does not include its end point. We have already discussed this model and its closure properties.

2. If the time periods are not “instantaneous,” they overlap when they share a common time period.

3. If the first term of the predicate is an INTERVAL and the second term is an instantaneous event (a < datetime > data type), they overlap when the second term is in the time period (but is not the end point of the time period). That follows the half-open model.

4. If the first and second terms are both instantaneous events, they overlap only when they are equal.

5. If the starting time is NULL and the finishing time is a < datetime > value, the finishing time becomes the starting time and we have an event. If the starting time is NULL and the finishing time is an INTERVAL value, then both the finishing and starting times are NULL.

Please consider how your intuition reacts to these results, when the granularity is at the YEAR-MONTH-DAY level. Remember that a day begins at 00:00:00 Hrs.

(today, today) OVERLAPS (today, today) = TRUE

(today, tomorrow) OVERLAPS (today, today) = TRUE

(today, tomorrow) OVERLAPS (tomorrow, tomorrow) = FALSE

(yesterday, today) OVERLAPS (today, tomorrow) = FALSE

Alexander Kuznetsov wrote this idiom for History Tables in T-SQL, but it generalizes to any SQL. It builds a temporal chain from the current row to the previous row. With a self-reference. This is easier to show with code:

CREATE TABLE Tasks
(task_id INTEGER NOT NULL,
 task_score CHAR(1) NOT NULL,
 previous_end_date DATE, -- null means first task
 current_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
 CONSTRAINT previous_end_date_and_current_start_in_sequence
 CHECK (prev_end_date <= current_start_date)
 DEFERRABLE INITIALLY IMMEDIATE,
 current_end_date DATE, -- null means unfinished current task
 CONSTRAINT current_start_and_end_dates_in_sequence
 CHECK (current_start_date <= current_end_date),
 CONSTRAINT end_dates_in_sequence
 CHECK (previous_end_date <> current_end_date),
 PRIMARY KEY (task_id, current_start_date),
 UNIQUE (task_id, previous_end_date), -- null first task
 UNIQUE (task_id, current_end_date), -- one null current task
 FOREIGN KEY (task_id, previous_end_date) -- self-reference
 REFERENCES Tasks (task_id, current_end_date));

Well, that looks complicated. Let’s look at it column by column. Task_id explains itself. The previous_end_date will not have a value for the first task in the chain, so it is NULL-able. The current_start_date and current_end_date are the same data elements, temporal sequence and PRIMARY KEY constraints we had in the simple history table schema.

The two UNIQUE constraints will allow one NULL in their pairs of columns and prevent duplicates. Remember that UNIQUE is NULL-able, not like PRIMARY KEY, which implies UNIQUE NOT NULL.

Finally, the FOREIGN KEY is the real trick. Obviously, the previous task has to end when the current task started for them to abut, so there is another constraint. This constraint is a self-reference that makes sure this is true. Modifying data in this type of table is easy but requires some thought.

Just one little problem with that FOREIGN KEY constraint. It will not let you put the first task into the table. There is nothing for the constraint to reference. In Standard SQL, we can declare constraints to be DEFERABLE with some other options. The idea is that you can turn a constraint ON or OFF during a session, so the database can be in state that would otherwise be illegal. But at the end of the session, all constraints have to be TRUE or UNKNOWN.

When a disabled constraint is re-enabled, the database does not check to ensure any of the existing data meets the constraints. You will want to hide this in a procedure body to get things started.

12.8 State-Transition Constraints

Transition constraints force status changes to occur in a particular order over time. A state-transition diagram is the best to show the rules. There is at least one initial state, flow lines that show what are the next legal states, and one or more termination states. Here is a simple state change diagram of possible marital states (Figure 12.2).

f12-02-9780128007617
Figure 12.2 Simple Marital State Change Diagram.

This state-transition diagram was deliberately simplified, but it is good enough to explain principles. To keep the discussion as simple as possible, my table is for only one person’s marital status over his life. Here is a skeleton DDL with the needed FOREIGN KEY reference to valid state changes and the date that the current state started.

CREATE TABLE MyLife
(previous_state VARCHAR(10) NOT NULL,
 current_state VARCHAR(10) NOT NULL,
 CONSTRAINT Improper_State_Change
 FOREIGN KEY (previous_state, current_state)
 REFERENCES StateChanges (previous_state, current_state),
start_date DATE NOT NULL PRIMARY KEY,
--etc.
);

What is not shown on it are which nodes are initial states (in this case “Born”) and which are terminal or final states (in this case “Dead,” a very terminal state of being). A terminal node can be the current state of a middle node, but not a prior state. Likewise, an initial node can be the prior state of a middle node, but not the current state. I did not write any CHECK() constraints for those conditions. It is easy enough to write a quick query with an EXISTS() predicate to do this, and I will leave that as an exercise for the reader. Let’s load the diagram into an auxiliary table with some more constraints.

CREATE TABLE StateChanges
(previous_state VARCHAR(10) NOT NULL,
 current_state VARCHAR(10) NOT NULL,
 PRIMARY KEY (previous_state, current_state),
 state_type CHAR(1) DEFAULT 'M' NOT NULL
 CHECK (state_type IN ('I', 'T', 'M')), -- initial, terminal, middle
 CONSTRAINT Node_Type_Violations
 CHECK (CASE WHEN state_type IN ('I', 'T')
AND previous_state = current_state
THEN 'T'
WHEN state_type = 'M'
AND previous_state <> current_state
THEN 'T' ELSE 'F' END = 'T'));
INSERT INTO StateChanges
 VALUES ('Born', 'Born', 'I'), -- initial state
('Born', 'Married', 'M'),
('Born', 'Dead', 'M'),
('Married', 'Divorced', 'M'),
('Married', 'Dead', 'M'),
('Divorced', 'Married', 'M'),
('Divorced', 'Dead', 'M'),
('Dead', 'Dead', 'T'), -- terminal state

We want to see a temporal path from an initial state to a terminal state. State changes do not happen all at once but are spread over time. Some of the changes are controlled by time, some by an agent. I cannot get married immediately after being born but have to wait to be of legal age. Then I have to consent.

For a real production system, you would need a more state pairs, but it is easy to expand the table.

CREATE PROCEDURE Change_State
(in_change_date DATE,
 in_change_state VARCHAR(10))
LANGUAGE SQL
DETERMINISTIC
BEGIN
DECLARE most_recent_state VARCHAR(10);
SET most_recent_state
= (SELECT current_state
FROM MyLife
 WHERE start_date
= (SELECT MAX(start_date) FROM MyLife));
-- insert initial state if empty
IF NOT EXISTS (SELECT * FROM MyLife)
AND in_change_state
IN (SELECT previous_state
 FROM StateChanges
 WHERE state_type = 'I')
THEN
INSERT INTO MyLife (previous_state, current_state, start_date)
VALUES (in_change_state, in_change_state, in_change_date);
END IF;
-- must be a real state change
IF in_change_state = most_recent_state
THEN SIGNAL SQLSTATE '75002'
SET MESSAGE_TEXT = 'This does not change the state.';
END IF;
-- must move forward in time
IF in_change_date <= (SELECT MAX(start_date) FROM MyLife)
THEN SIGNAL SQLSTATE '75003'
SET MESSAGE_TEXT = 'Violates time sequence.';
END IF;
INSERT INTO MyLife (previous_state, current_state, start_date)
VALUES (most_recent_state, in_change_state, in_change_date);
END;

The first block of code locates the most recent state of my life based on the date. The second block of code will insert an initial state if the table is empty. This is a safety feature but there probably ought to be a separate procedure to create the set of initial states. The new state has to be an actual change, so there is a block of code to be sure. The changes have to move forward in time. Finally, we build a row using the most recent state as the new previous state, the input change state and the date. If the state change is illegal, the FOREIGN KEY is violated and we get an error.

If you had other business rules, you could also add them to the code in the same way. You should have noticed that if someone makes changes directly to the MyLife Table, they are pretty much free to screw up the data. It is a good idea to have a procedure that checks to see that MyLife is in order. Let’s load the table with bad data:

INSERT INTO MyLife (previous_state, current_state, start_date)
VALUES ('Born', 'Married', '1990-09-05'),
('Married', 'Divorced', '1999-09-05'),
('Married', 'Dead', '2010-09-05'),
('Dead', 'Dead', '2011-05-10'),
('Dead', 'Dead', '2012-05-10'),

This poor guy popped into existence without being properly born, committed bigamy and died twice. And you think your life is tough. You will need a simple validation procedure to catch those errors.

What is still missing is the temporal aspect of state changes. In this example, the (‘Born,’ ‘Married’) change would have to deal with the minimum age of consent. The (‘Married,’ ‘Divorced’) change often has a legal waiting period. While technically a business rule, you know that no human being has lived over 150 years, so a gap that size is a data error. The terminal and initial states are instantaneous, however. Let’s add more flesh to the skeleton table:

CREATE TABLE StateChanges
(previous_state VARCHAR(10) NOT NULL,
 current_state VARCHAR(10) NOT NULL,
 PRIMARY KEY (previous_state, current_state),
 state_type CHAR(1) DEFAULT 'M' NOT NULL
 CHECK (state_type IN ('I', 'T', 'M')), -- initial, terminal, middle
 state_duration INTEGER NOT NULL -- unit of measure is months
 CHECK (state_duration >= 0),
 CONSTRAINT Node_type_violations
 CHECK (CASE WHEN state_type IN ('I', 'T')
AND previous_state = current_state
THEN 'T'
WHEN state_type = 'M'
AND previous_state <> current_state
THEN 'T' ELSE 'F' END = 'T')
);

To make up some data, let’s assume that the age of consent is 18 (12 months * 18 years = 216), that you have to wait 3 months into your marriage before getting a divorce, and that you have to be divorced 2 months before you can remarry. Of course, you can die instantly.

INSERT INTO StateChanges
 VALUES ('Born', 'Born', 'I', 0), -- initial state
('Born', 'Married', 'M', 216),
('Born', 'Dead', 'M', 0),
('Married', 'Divorced', 'M', 3),
('Married', 'Dead', 'M', 0),
('Divorced', 'Married', 'M', 2),
('Divorced', 'Dead', 'M', 0),
('Dead', 'Dead', 'T', 0); -- terminal state

The first question is where to check for temporal violations; during insertion or with validation procedures? My answer is both. Whenever possible, do not knowingly put bad data into a schema, so this should be done in the ChangeState() procedure. But someone or something will subvert the schema, and you have to be able to find and repair the damage.

A lot of commercial situations have a fixed lifespan. Warranties, commercial offers, and bids expire in a known number of days. This means adding another column to the StateChanges table that tells the insertion program if the expiration date is optional (shown with a NULL) or mandatory (computed from the duration).

Here is some skeleton DDL for a bid application to explain this better.

CREATE TABLE MyBids
(bid_nbr INTEGER NOT NULL,
 previous_state VARCHAR(10) NOT NULL,
 current_state VARCHAR(10) NOT NULL,
 CONSTRAINT Improper_State_Change
 FOREIGN KEY (previous_state, current_state)
 REFERENCES StateChanges (previous_state, current_state),
start_date DATE NOT NULL PRIMARY KEY,
expiry_date DATE, -- null means still open.
CHECK (start_date <= expiry_date),
 PRIMARY KEY (bid_nbr, start_date),
 etc.
);

The DDL has a bid number as the primary key and a new column for the expiration date. Obviously, the bid has to exist for a while, so add a constraint to keep the date order right.

CREATE TABLE StateChanges
(previous_state VARCHAR(10) NOT NULL,
 current_state VARCHAR(10) NOT NULL,
 PRIMARY KEY (previous_state, current_state),
 state_duration INTEGER NOT NULL,
 duration_type CHAR(1) DEFAULT 'O' NOT NULL
CHECK ('O', 'M')), -- optional, mandatory
 etc.
);

The DDL for the state changes gets a new column to tell us if the duration is optional or mandatory. The insertion procedure is a bit trickier. The VALUES clause has more power than most programmers use. The list can be more than just constants or simple scalar variables. But using CASE expressions lets you avoid if-then-else procedural logic in the procedure body.

All it needs is the bid number and what state you want to use. If you don’t give me a previous state, I assume that this is an initial row and repeat the current state you just gave me. If you don’t give me a start date, I assume you want the current date. If you don’t give me an expiration date, I construct one from the State Changes table with a scalar subquery. Here is the skeleton DDL for an insertion procedure.

12.9 Calendar Tables

Because the calendar and temporal math are so irregular, build axillary tables for various single day calendars and for period calendars, one column for the calendar date and other columns to show whatever your business needs in the way of temporal information. Do not try to calculate holidays in SQL—Easter alone requires too much math. Oh, which Easter? Catholic or Orthodox?

1. Derek Dongray came up with a classification of the public holidays and weekends he needed to work within multiple countries. Here is his list with more added.

2. Fixed date every year.

3. Days relative to Easter.

4. Fixed date but will slide to next Monday if on a weekend.

5. Fixed date but slides to Monday if Saturday or Tuesday if Sunday (UK Boxing Day is the only one).

6. Specific day of week after a given date (usually first/last Monday in a month but can be other days, e.g., First Thursday after November 22 = Thanksgiving).

7. Days relative to Greek Orthodox Easter (not always the same as Western Easter) Fixed date in Hijri (Muslim) Calendar—this turns out to only be approximate due to the way the calendar works. An Imam has to see a full moon to begin the cycle and declare it.

8. Days relative to previous Winter Solstice (Chinese holiday of Qing Ming Ji). Civil holidays set by decree, such as a National Day of Mourning due to an unscheduled event.

9. Fixed date except Saturday slides to Friday, and Sunday slides to Monday.

10. Fixed date, but Tuesday slides to Monday, and Thursday to Friday. The day Columbus discovered America is a national holiday in Argentina. Except when it’s a Tuesday, they back it one day to Monday.

As you can see, some of these are getting a bit esoteric and a bit fuzzy. A calendar table for U.S. Secular holidays can be built from the data at http://www.smart.net/~mmontes/ushols.html.

You will probably want a fiscal calendar in this table. Which fiscal calendar? The GAAP (General Accepted Accounting Practices) lists over a hundred of them.

I would add the ordinal date and week date, which we discussed earlier. They make temporal math much easier.

The Julian business day is a good trick. Number the days from whenever your calendar starts and repeat a number for a weekend or company holiday. This counts business days.

CREATE TABLE Calendar
CREATE TABLE Calendar
(cal_date DATE NOT NULL PRIMARY KEY,
 julian_business_nbr INTEGER NOT NULL,
 . . .);
INSERT INTO Calendar VALUES ('2007-04-05', 42),
 ('2007-04-06', 43), -- good Friday
 ('2007-04-07', 43),
 ('2007-04-08', 43), -- Easter Sunday
 ('2007-04-09', 44),
 ('2007-04-10', 45); --Tuesday

To compute the business days from Thursday of this week to next Tuesdays:

SELECT (C2.julian_business_nbr - C1.julian_business_nbr)
FROM Calendar AS C1, Calendar AS C2
 WHERE C1.cal_date = '2007-04-05',
 AND C2.cal_date = '2007-04-10';

12.9.1 Report Period Tables

Since SQL is a database language, we prefer to do look ups and not calculations. They can be optimized while temporal math messes up optimization. A useful idiom is a report period calendar that everyone uses, so there is no way to get disagreements in the DML.

The report period table gives a name to a range of dates that is common to the entire enterprise.

CREATE TABLE Something_Report_Periods
(something_report_name CHAR(10) NOT NULL PRIMARY KEY
 CHECK (something_report_name LIKE < pattern >),
 something_report_start_date DATE NOT NULL,
 something_report_end_date DATE NOT NULL,
CONSTRAINT date_ordering
CHECK (something_report_start_date <= something_report_end_date),
etc);

These report periods can overlap or have gaps. Avoid period names that are language dependent; they will have trouble porting. If possible the periods name should sort in temporal order. Again, I like the MySQL convention of using double zeroes for months and years, That is, ‘yyyy-mm-00’ for a month within a year and ‘yyyy-00-00’ for the whole year. The advantages are that it will sort with the ISO-8601 date format and will go to the top of each year and month within the year.

Overlapping periods are useful for reporting things like sales promotions. You can quickly see if the overlap between your “Bikini Madness Week” and “Three Day Suntan Lotion Promotion” helped increase total sales.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.108.111