Chapter 19

BETWEEN and OVERLAPS Predicates

Abstract

The BETWEEN and OVERLAPS predicates both offer a shorthand way of showing that one value lies within a range defined by two other values. BETWEEN works with scalar range limits of any type; the OVERLAPS predicate looks at two time periods.

Keywords

BETWEEN

OVERLAPS

SYMMETRIC

ASYMMETRIC

NULLs

INTERVAL

The between and OVERLAPS predicates both offer a shorthand way of showing that one value lies within a range defined by two other values. BETWEEN works with scalar range limits of any type; the OVERLAPS predicate looks at two time periods (defined either by start and end points or by a starting time and an INTERVAL) to see if they overlap in time.

19.1 The BETWEEN Predicate

The BETWEEN predicate is a feature of SQL that is used enough to deserve special attention. It is also just tricky enough to fool beginning programmers.

< between predicate >
  ::= < row value predicand > < between predicate part 2 >
< between predicate part 2 > ::=
[NOT] BETWEEN [ASYMMETRIC | SYMMETRIC]
< row value predicand > AND < row value predicand >

If neither SYMMETRIC nor ASYMMETRIC is specified, then ASYMMETRIC is implicit. This is the original definition of the shorthand that only worked with single valued predicands. This is not likely what you will see in your product.

Let x, y, and z be the first, second, and third row value predicands, respectively, so we can start defining this predicate.

1. x NOT BETWEEN SYMMETRIC y AND z

means

NOT (x BETWEEN SYMMETRIC y AND z)

No surprises here, since that is how SQL has handled optional NOT in all constructs.

2. X BETWEEN SYMMETRIC y AND z

means

((x BETWEEN ASYMMETRIC y AND z)
OR (x BETWEEN ASYMMETRIC z AND y))

There is a historical note about the early days of ANSI X3H2. We voted to make this the definition of “x BETWEEN y AND z” at one meeting. This was revoked at the next committee meeting, but Microsoft had gone ahead and changed it in their ACCESS database product. They failed to read the follow up papers.

3. x NOT BETWEEN ASYMMETRIC y AND z

means

NOT (x BETWEEN ASYMMETRIC y AND z)

No surprises here, since that is how SQL has handled optional NOT in all constructs.

4. x BETWEEN ASYMMETRIC y AND z

means

x >= y AND x <= z

Please note that the end points are included in this definition. This predicate works with any data types that can be compared. Most programmers miss this fact and use it only for numeric values. It can be used for character strings and temporal data as well. The < row value predicand>s can be expressions or constants, but again programmers tend to use just constants or column names.

Many optimizers will take special steps for the BETWEEN because SQL implementations often use B + and other tree indexes that have ranges in their nodes. It is also more human readable than its definition and shows a higher level of abstraction.

19.1.1 Results with NULL Values

The results of this predicate with NULL values for the < row value predicand>s follow directly from the definition. If either < row value predicand>s is NULL, the result is UNKNOWN for any value of < value expression >.

19.1.2 Results with Empty Sets

Notice that if first second < row value predicand > is less than the third < row value predicand >, the ASYMMETRIC expression will always be FALSE unless the value is NULL; then it is UNKNOWN. That is a bit confusing, since there is no value to which < value expression > could resolve itself that would produce a TRUE result. But this follows directly from expanding the definition.

x BETWEEN ASYMMETRIC 12 AND 15 -- depends on x
x BETWEEN ASYMMETRIC 15 AND 12 -- always FALSE
x BETWEEN ASYMMETRIC NULL AND 15 -- always UNKNOWN
NULL BETWEEN ASYMMETRIC 12 AND 15 -- always UNKNOWN
x BETWEEN ASYMMETRIC 12 AND NULL -- always UNKNOWN
x BETWEEN ASYMMETRIC x AND x -- always TRUE
x BETWEEN SYMMETRIC 12 AND 15 -- depends on x
x BETWEEN SYMMETRIC 15 AND 12 -- depends on x
x BETWEEN SYMMETRIC NULL AND 15 -- always UNKNOWN
NULL BETWEEN SYMMETRIC 12 AND 15 -- always UNKNOWN
x BETWEEN SYMMETRIC 12 AND NULL -- always UNKNOWN
x BETWEEN SYMMETRIC x AND x -- always TRUE

19.1.3 Programming Tips

The BETWEEN range includes the end points, so you have to be careful. For example, changing a percent range on a test into a letter grade

Grades

low_scorehigh_scoregrade
90100‘A’
8090‘B’
7080‘C’
6070‘D’
060‘F’

will not work when a student gets a grade on the borderlines (90, 80, 70, or 60). One way to solve the problem is to change the table by adding 1 to the low scores. Of course, the student who got 90.1 will argue that he should have gotten an ‘A’ and not a ‘B’. If you add 0.01 to the low scores, the student who got 90.001 will argue that he should have gotten an ‘A’ and not a ‘B’ and so forth. This is a problem with a continuous variable. A better solution might be to change the predicate to (score BETWEEN low_score AND high_score) AND (score > low_score) or simply to ((low_score < score) AND (score <= high_score)). Neither approach will be much different in this example, since few values will fall on the borders between grades and this table is very, very small.

However, some indexing schemes might make the BETWEEN predicate the better choice for larger tables of this sort. They will keep index values in trees whose nodes hold a range of values (look up a description of the B tree family in a computer science book). An optimizer can compare the range of values in the BETWEEN predicate to the range of values in the index nodes as a single action. If the BETWEEN predicate were presented as two comparisons, it might execute them as separate actions against the database, which would be slower.

19.2 OVERLAPS Predicate

The OVERLAPS predicate requires the Standard SQL temporal data features that might not be in your implementation yet.

19.2.1 Time Periods and OVERLAPS Predicate

Temporal data types and functions are the most irregular features in SQL products. By the time the ANSI/ISO Standards were written, each dialect had its own set of functions and features. But let’s start with the concept of an INTERVAL, which is a measure of temporal duration, expressed in units such as days, hours, minutes, and so forth. This is how you add or subtract days to or from a date, hours and minutes to or from a time, and so forth. A time period is defined by a fixed starting point and fixed stopping point in time.

The OVERLAPS predicate compares two time periods. These time periods are defined as row values with two columns. The first column (the starting time) of the pair is always a < datetime > data type, and the second column (the termination time) is a < datetime > data type that can be used to compute a < datetime > value. If the starting and termination time are the same, this is an instantaneous event.

The BNF for this predicate is

< overlaps predicate >
  ::= < overlaps predicate part 1 > < overlaps predicate part 2 >
< overlaps predicate part 1 > ::= < row value predicand 1 >
< overlaps predicate part 2 > ::= OVERLAPS < row value predicand 2 >
< row value predicand 1 > ::= < row value predicand >
< row value predicand 2 > ::= < row value predicand >

The result of the < overlaps predicate > is formally defined as the result of the following expression:

 (S1 > S2 AND NOT (S1 >= T2 AND T1 >= T2))
 OR (S2 > S1 AND NOT (S2 >= T1 AND T2 >= T1))
 OR (S1 = S2 AND (T1 <> T2 OR T1 = T2))

where S1 and S2 are the starting time of the two time periods and T1 and T2 are their respective termination time.

The rules for the OVERLAPS predicate should be intuitive, but they are not. The principles that we wanted in the Standard were

1. A time period includes its starting point but does not include its end point. The reason for this model is that it follows the ISO convention that there is no TIME ‘24:00:00’ h today; it is TIME ‘00:00:00’ h tomorrow. Half-open durations have closure properties that are useful. The overlap, intersection, and concatenation of two half-open durations is an half-open duration.

2. If the time periods are not instantaneous, they overlap when they share a common time period.

3. If the first term of the predicate is an INTERVAL and the second term is an instantaneous event (a < datetime > data type), they overlap when the second term is in the time period (but is not the end point of the time period).

4. If the first and second terms are both instantaneous events, they overlap only when they are equal.

5. If the starting time is NULL and the finishing time is a known < datetime > value, the finishing time becomes the starting time and we have an event. If the starting time is NULL and the finishing time is an INTERVAL value, then both the finishing and starting time are NULL.

Please consider how your intuition reacts to these results, when the granularity is at the YEAR-MONTH-DAY level. Remember that a day begins at 00:00:00 h.

 (today, today) OVERLAPS (today, today) is TRUE
 (today, tomorrow) OVERLAPS (today, today) is TRUE
 (today, tomorrow) OVERLAPS (tomorrow, tomorrow) is FALSE
 (yesterday, today) OVERLAPS (today, tomorrow) is FALSE

This is still not very intuitive, so let’s draw pictures. Consider a table of hotel guests with the days of their stays and a table of special events being held at the hotel. The tables might look like this:

CREATE TABLE Guests
(guest_name CHARACTER(30) NOT NULL PRIMARY KEY,
 arrival_date DATE NOT NULL,
 departure_date DATE NOT NULL,
 . . .);

Guests

guest_namearrival_datedeparture_date
‘Doroth yGale’‘2016-02-01’‘2016-11-01’
‘Indiana Jones’‘2016-02-01’‘2016-02-01’
‘Don Quixote’‘2016-01-01’‘2016-10-01’
‘JamesT.Kirk’‘2016-02-01’‘2016-02-28’
‘Santa Claus’‘2016-12-01’‘2016-12-25’
CREATE TABLE Celebrations
 (celeb_name CHARACTER(30) PRIMARY KEY,
  celeb_start_date DATE NOT NULL,
  celeb_end_date DATE NOT NULL,
  . . .);

Celebrations

celeb_nameceleb_start_dateceleb_end_date
‘Apple Month’‘2016-02-01’‘2016-02-28’
‘Christmas Season’‘2016-12-01’‘2016-12-25’
‘Garlic Festival’‘2016-01-15’‘2016-02-15’
‘National Pear Week’‘2016-01-01’‘2016-01-07’
‘New Year’s Day’‘2016-01-01’‘2016-01-01’
‘St. Fred’s Day’‘2016-02-24’‘2016-02-24’
‘Year of the Prune’‘2016-01-01’‘2016-12-31’

The BETWEEN operator will work just fine with single dates that fall between the starting and finishing dates of these celebrations, but please remember that the BETWEEN predicate will include the end point of an interval and that the OVERLAPS predicate will not. To find out if a particular date occurs during an event, you can simply write queries like

SELECT guest_name, celeb_name
    FROM Guests, Celebrations
  WHERE arrival_date BETWEEN celeb_start_date AND celeb_end_date
  AND arrival_date <> celeb_end_date;

which will find the guests who arrived at the hotel during each event. The final predicate can be kept, if you want to conform to the ANSI convention, or dropped if that makes more sense in your situation. From now on, we will keep both end points to make the queries easier to read.

SELECT guest_name, celeb_name
  FROM Guests, Celebrations
 WHERE arrival_date BETWEEN celeb_start_date AND celeb_end_date;

Results

guest_nameceleb_name
‘Dorothy Gale’‘Apple Month’
‘Dorothy Gale’‘Garlic Festival’
‘Dorothy Gale’‘Year of the Prune’
‘Indiana Jones’‘Apple Month’
‘Indiana Jones’‘Garlic Festival’
‘Indiana Jones’‘Year of the Prune’
‘Don Quixote’‘National Pear Week’
‘Don Quixote’‘New Year's Day’
‘Don Quixote’‘Year of the Prune’
‘James T. Kirk’‘Apple Month’
‘James T. Kirk’‘Garlic Festival’
‘James T. Kirk’‘Year of the Prune’
‘Santa Claus’‘Christmas Season’
‘Santa Claus’‘Year of the Prune’

The obvious question is which guests were at the hotel during each event. A common programming error when trying to find out if two intervals overlap is to write the query with the BETWEEN predicate, thus:

SELECT guest_name, celeb_name
   FROM Guests, Celebrations
   WHERE arrival_date BETWEEN celeb_start_date AND celeb_end_date
OR departure_date BETWEEN celeb_start_date AND celeb_end_date;

This is wrong, because it does not cover the case where the event began and finished during the guest’s visit. Seeing his error, the programmer will sit down and draw a timeline diagram of all four possible overlapping cases, like this:

u19-01-9780128007617

So the programmer adds more predicates, thus:

SELECT guest_name, celeb_name
FROM Guests, Celebrations
WHERE arrival_date BETWEEN celeb_start_date AND celeb_end_date
 OR departure_date BETWEEN celeb_start_date AND celeb_end_date
 OR celeb_start_date BETWEEN arrival_date AND departure_date
 OR celeb_end_date BETWEEN arrival_date AND departure_date;

A thoughtful programmer will notice that the last predicate is not needed and might drop it, but either way, this is a correct query. But it is not the best answer. In the case of the overlapping intervals, there are two cases where a guest’s stay at the hotel and an event do not both fall within the same time frame: Either the guest checked out before the event started or the event ended before the guest arrived. If you want to do the logic, that is what the first predicate will work out to be when you also add the conditions that arrival_date <= departure_date and celeb_start_date <= celeb_end_date. But it is easier to see in a timeline diagram, thus:

u19-02-9780128007617

Both cases can be represented in one SQL statement as

SELECT guest_name, celeb_name
FROM Guests, Celebrations
 WHERE NOT ((departure_date < celeb_start_date) OR (arrival_date > celeb_end_date));

VIEW GuestsEvents

guest_nameceleb_name
‘Dorothy Gale’‘Apple Month’
‘Dorothy Gale’‘Garlic Festival’
‘Dorothy Gale’‘St. Fred's Day’
‘Dorothy Gale’‘Year of the Prune’
‘Indiana Jones’‘Apple Month’
‘Indiana Jones’‘Garlic Festival’
‘Indiana Jones’‘Year of the Prune’
‘Don Quixote’‘Apple Month’
‘Don Quixote’‘Garlic Festival’
‘Don Quixote’‘National Pear Week’
‘Don Quixote’‘New Year's Day’
‘Don Quixote’‘St. Fred's Day’
‘Don Quixote’‘Year of the Prune’
‘James T. Kirk’‘Apple Month’
‘James T. Kirk’‘Garlic Festival’
‘James T. Kirk’‘St. Fred's Day’
‘James T. Kirk’‘Year of the Prune’
‘Santa Claus’‘Christmas Season’
‘Santa Claus’‘Year of the Prune’

t0030

This VIEW is handy for other queries. The reason for using the NOT in the WHERE clause is that you can add or remove it to reverse the sense of the query. For example, to find out how many celebrations each guest could have seen, you would write

CREATE VIEW Guest_Celebrations (guest_name, celeb_name)
AS SELECT guest_name, celeb_name
    FROM Guests, Celebrations
  WHERE NOT ((departure_date < celeb_start_date) OR (arrival_date > celeb_end_date));
SELECT guest_name, COUNT(*) AS celebcount
   FROM Guest_Celebrations
   GROUP BY guest_name;

Results

guest_nameceleb_cnt
‘Dorothy Gale’4
‘Indiana Jones’3
‘Don Quixote’6
‘James T. Kirk’4
‘Santa Claus’2

and then to find out how many guests were at the hotel during each celebration, you would write

SELECT celeb_name, COUNT(*) AS guest_cnt
   FROM Guest_Celebrations
   GROUP BY celeb_name;

Result

celeb_nameguest_cnt
‘Apple Month’4
‘Christmas Season’1
‘Garlic Festival’4
‘National Pear Week’1
‘New Year's Day’1
‘St. Fred's Day’3
‘Year of the Prune’5

This last query is only part of the story. What the hotel management really wants to know is how many room nights were sold for a celebration. A little algebra tells you that the length of an event is (celeb_end_date − celeb_start_date + INTERVAL ‘1’ DAY) and that the length of a guest’s stay is ((Guest.departure_date − Guest.arrival_date) + 1) because the difference is an integer count of the days. Let’s do one of those timeline charts again:

u19-03-9780128007617

What we want is the part of the Guests interval that is inside the Celebrations interval.

Guests 1 and 2 spent only part of their time at the celebration; Guest 3 spent all of his time at the celebration; and Guest 4 stayed even longer than the celebration. That interval is defined by the two points (CASE WHEN arrival_date > celeb_start_date THEN arrival_date ELSE celeb_start_date END) and (CASE WHEN departure_date < celeb_end_date THEN departure_date ELSE celeb_end_date END).

Instead, you can use the aggregate functions in SQL to build a VIEW on a VIEW, like this:

CREATE VIEW Working (guest_name, celeb_name, entry_date, exit_date)
AS
SELECT GE.guest_name, GE.celeb_name, celeb_start_date, celeb_end_date
  FROM Guest_Celebrations AS GE, Celebrations AS E1
 WHERE E1.celeb_name = GE.celeb_name
UNION
SELECT GE.guest_name, GE.celeb_name, arrival_date, departure_date
  FROM Guest_Celebrations AS GE, Guests AS G1
 WHERE G1.guest_name = GE.guest_name;

VIEW Working

guest_nameceleb_nameentry_dateexit_date
‘Dorothy Gale’‘Apple Month’‘2016-02-01’‘2016-02-28’
‘Dorothy Gale’‘Apple Month’‘2016-02-01’‘2016-11-01’
‘Dorothy Gale’‘Garlic Festival’‘2016-02-01’‘2016-11-01’
‘Dorothy Gale’‘Garlic Festival’‘2016-01-15’‘2016-02-15’
‘Dorothy Gale’‘St. Fred's Day’‘2016-02-01’‘2016-11-01’
‘Dorothy Gale’‘St. Fred's Day’‘2016-02-24’‘2016-02-24’
‘Dorothy Gale’‘Year of the Prune’‘2016-02-01’‘2016-11-01’
‘Dorothy Gale’‘Year of the Prune’‘2016-01-01’‘2016-12-31’
‘Indiana Jones’‘Apple Month’‘2016-02-01’‘2016-02-01’
‘Indiana Jones’‘Apple Month’‘2016-02-01’‘2016-02-28’
‘Indiana Jones’‘Garlic Festival’‘2016-02-01’‘2016-02-01’
‘Indiana Jones’‘Garlic Festival’‘2016-01-15’‘2016-02-15’
‘Indiana Jones’‘Year of the Prune’‘2016-02-01’‘2016-02-01’
‘Indiana Jones’‘Year of the Prune’‘2016-01-01’‘2016-12-31’
‘Don Quixote’‘Apple Month’‘2016-02-01’‘2016-02-28’
‘Don Quixote’‘Apple Month’‘2016-01-01’‘2016-10-01’
‘Don Quixote’‘Garlic Festival’‘2016-01-01’‘2016-10-01’
‘Don Quixote’‘Garlic Festival’‘2016-01-15’‘2016-02-15’
‘Don Quixote’‘National Pear Week’‘2016-01-01’‘2016-01-07’
‘Don Quixote’‘National Pear Week’‘2016-01-01’‘2016-10-01’
‘Don Quixote’‘New Year's Day’‘2016-01-01’‘2016-01-01’
‘Don Quixote’‘New Year's Day’‘2016-01-01’‘2016-10-01’
‘Don Quixote’‘St. Fred's Day’‘2016-02-24’‘2016-02-24’
‘Don Quixote’‘St. Fred's Day’‘2016-01-01’‘2016-10-01’
‘Don Quixote’‘Year of the Prune’‘2016-01-01’‘2016-12-31’
‘Don Quixote’‘Year of the Prune’‘2016-01-01’‘2016-10-01’
‘James T. Kirk’‘Apple Month’‘2016-02-01’‘2016-02-28’
‘James T. Kirk’‘Garlic Festival’‘2016-02-01’‘2016-02-28’
‘James T. Kirk’‘Garlic Festival’‘2016-01-15’‘2016-02-15’
‘James T. Kirk’‘St. Fred's Day’‘2016-02-01’‘2016-02-28’
‘James T. Kirk’‘St. Fred's Day’‘2016-02-24’‘2016-02-24’
‘James T. Kirk’‘Year of the Prune’‘2016-02-01’‘2016-02-28’
‘James T. Kirk’‘Year of the Prune’‘2016-01-01’‘2016-12-31’
‘Santa Claus’‘Christmas Season’‘2016-12-01’‘2016-12-25’
‘Santa Claus’‘Year of the Prune’‘2016-12-01’‘2016-12-25’
‘Santa Claus’‘Year of the Prune’‘2016-01-01’‘2016-12-31’

t0045

This will put the earliest and latest points in both intervals into one column. Now we can construct a VIEW like this:

CREATE VIEW Attendees (guest_name, celeb_name, entry_date, exit_date)
AS
SELECT guest_name, celeb_name,
   MAX(entry_date), MIN(exit_date)
   FROM Working
   GROUP BY guest_name, celeb_name;

VIEW Attendees

guest_nameceleb_nameentry_dateexit_date
‘Dorothy Gale’‘Apple Month’‘2016-02-01’‘2016-02-28’
‘Dorothy Gale’‘Garlic Festival’‘2016-02-01’‘2016-02-15’
‘Dorothy Gale’‘St. Fred's Day’‘2016-02-24’‘2016-02-24’
‘Dorothy Gale’‘Year of the Prune’‘2016-02-01’‘2016-11-01’
‘Indiana Jones’‘Apple Month’‘2016-02-01’‘2016-02-01’
‘Indiana Jones’‘Garlic Festival’‘2016-02-01’‘2016-02-01’
‘Indiana Jones’‘Year of the Prune’‘2016-02-01’‘2016-02-01’
‘Don Quixote’‘Apple Month’‘2016-02-01’‘2016-02-28’
‘Don Quixote’‘Garlic Festival’‘2016-01-15’‘2016-02-15’
‘Don Quixote’‘National Pear Week’‘2016-01-01’‘2016-01-07’
‘Don Quixote’‘New Year's Day’‘2016-01-01’‘2016-01-01’
‘Don Quixote’‘St. Fred's Day’‘2016-02-24’‘2016-02-24’
‘Don Quixote’‘Year of the Prune’‘2016-01-01’‘2016-10-01’
‘James T. Kirk’‘Apple Month’‘2016-02-01’‘2016-02-28’
‘James T. Kirk’‘Garlic Festival’‘2016-02-01’‘2016-02-15’
‘James T. Kirk’‘St. Fred's Day’‘2016-02-24’‘2016-02-24’
‘James T. Kirk’‘Year of the Prune’‘2016-02-01’‘2016-02-28’
‘Santa Claus’‘Christmas Season’‘2016-12-01’‘2016-12-25’
‘Santa Claus’‘Year of the Prune’‘2016-12-01’‘2016-12-25’

t0050

The Attendees VIEW can be used to compute the total number of room days for each celebration. Assume that the difference of two dates will return an integer that is the number of days between them:

SELECT celeb_name,
SUM(exit_date - entry_date + INTERVAL '1' DAY)
  AS room_days
  FROM Attendees
  GROUP BY celeb_name;

Result

celeb_nameroom_days
‘Apple Month’85
‘Christmas Season’25
‘Garlic Festival'63
‘National Pear Week’ 7
‘New Year's Day’ 1
‘St. Fred's Day’ 3
‘Year of the Prune’602

If you would like to get a count of the room days sold in the month of January, you could use this query, which avoids a BETWEEN or OVERLAPS predicate completely.

SELECT SUM(CASE WHEN depart > DATE '2016-01-31'
       THEN DATE '2016-01-31'
        ELSE depart END
   - CASE WHEN arrival_date < DATE '2016-01-01'
         THEN DATE '2016-01-01'
         ELSE arrival_date END + INTERVAL '1' DAY)
  AS room_days
  FROM Guests
 WHERE depart > DATE '2016-01-01'
  AND arrival_date <= DATE '2016-01-31';
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.77.63