12.7. SQL: in, between, like, and is null Operators

This section examines four special operators (other than the comparators =, <, etc.) that SQL provides for use within search conditions: in; ...between...and...; like and is null. These are sometimes called functions, and the conditions they are used to express are called “predicates” in the SQL standard.

A function is something that takes zero or more values as arguments and returns a single value as its result. You are probably familiar with functions from mathematics or programming, such as cos (x) or sqrt (x). As these examples illustrate, syntactically a function is usually represented as a function identifier preceding its arguments, which are typically included in parentheses.

When the action performed by a function is represented without bracketing all the arguments, we usually describe the notation as involving operators and operands rather than functions and arguments. Operators may be represented in infix, prefix, postfix, and mixfix notation according to whether the operator appears between, before, after, or mixed among the operands. For instance, the sum of 2 and 3 might be set out as:

sum (2,3)-- function
2 + 3-- infix operator
+ 23-- prefix operator
23 +-- postfix operator
sum of 2 and 3-- mixfix operator

The four operators we are about to discuss are used to express search conditions. The first three return the value True, False, or Unknown, while the is null operator returns True or False. Our initial treatment focuses on the SQL-89 version of these operators. Extensions for SQL-92 onwards are mentioned later. Most of our examples are based on the Person table, reproduced here as Table 12.9.

Table 12.9. A relational table storing personal details.


Suppose we want the names and birth years of the people born in 1950, 1967, or 1974. One way of requesting this information is shown in the following SQL query. As an exercise, check that this results in four rows.

select firstname, birthyr from Person

where birthyr = 1950 or birthyr = 1967 or birthyr = 1974

Imagine how tedious this way of phrasing the request would be if there were a dozen or more years involved. Partly to make life easier in such situations, SQL includes an in operator to handle bag membership and hence set membership. Using this infix operator, the aforementioned request may be formulated more briefly as:

select firstname, birthyr from Person

where birthyr in (1950,1967,1974)

Here the search condition is that the birthyr value is a member of the bag containing the values 1950, 1967, and 1974. The order in which these values are written does not matter, nor would it matter if any are duplicated. In general, if x is some expression (e.g., a column name) and a, b, etc. are data values (e.g., numeric or string constants), then the SQL condition shown here on the left is equivalent to the mathematical expression shown on the right:

x in (a, b...) means x ε [a,b...]

We use in instead of “∊”, and parentheses “()” instead of square brackets “[ ]” for bag delimiters. Since a set may be thought of as a bag with no duplicates, the in operator may also be used for set membership. Unlike many programming languages, SQL allows bags or sets to contain character strings (not just numbers). For example, the following query lists the Aquarians and Taureans (Eve, Selena, and Terry):

select firstname from Person

where starsign in (‘Aquarius’,Taurus’)

To indicate that the value of an expression does not belong to a bag or set, the logical not operator may be used with the in operator. As with ordinary comparison operators, SQL gives the in operator higher priority than logical operators (unlike most programming languages). In SQL the condition “not (x in S)” may be rendered more briefly as not x In S. Even better, SQL also allows not in for “∉”, so this may set out more naturally as x not in S.

x not in (a, b...)    means     x∉[a,b...]

For example, to obtain the names and birth years of those not born in any of the years 1950, 1967, or 1974 the following query may be used (yielding Fred and Terry):

select firstname, birthyr from Person

where birthyr not in (1950,1967,1974)

Sometimes we wish to determine whether an expression has a value occurring in a range of values. In mathematics, to say that some variable x has a value in the range from a to b we usually express this as: a ≤ x ≤ b. This notation is illegal in SQL. Instead we could say: a <= x and x <= b. For the sample population, the following query returns the name and birth year of all but Terry.

select firstname, birthyr from Person

where 1950 <= birthyr and birthyr <= 1974

As another way to specify range membership, SQL provides the ternary mixfix operator “...between...and...” which may be defined as shown, where x, a, and b may be scalar expressions:

x between a and b;  means        a <= x and x  <= b

For example, the previous query may be reformulated as:

select firstname, birthyr from Person

where birthyr between 1950 and 1974

Note carefully that, in contrast to ordinary English, between in SQL is read in an inclusive sense. For instance, both 1950 and 1974 are included in the given range.

Notice also that here and is just part of the mixfix operator: it is not a logical operator. As already noted, the < = operator may be used to order strings as well as numbers. So strings may be used as operands. For example, the following query returns the bag (‘Bob’, ‘Eve’, ‘Fred’).

select firstname from Person

where firstname between‘Bob’ and‘Fred’

Nonmembership in a range may be expressed with the help of the not operator, which has lower priority than between, and may be placed just before the word between. Thus each of the following conditions is equivalent to x < a or x > b.

x not between a and b    not ( x between a and b)      not x between a and b

The not between formulation is easier to read. For example, the following query returns just the tuple (‘Terry’, 1946).

select firstname, birthyr from Person

where birthyr not between 1950 and 1974

In the pure relational model, column values are atomic. However, SQL provides a number of ways of accessing substrings within character string values. In particular, the like operator may be used for pattern matching with character strings. The general form of the condition in SQL-89 may be set out in BNF thus:

char-col[not] like quoted-string [escape quoted-char]

Here char-col is the name of a column based on a character string data type. The quoted string is a character string, surrounded by single quotes, which may contain wildcard characters. You may be familiar with the use of wildcards for matching filenames at the operating system level (e.g., “*” and “?” in MS-DOS). In the absence of an escape clause, SQL gives the percentage character “%” and the underscore character “_” the following special meanings if included in a quoted string operated on by like:

%meanszero or more characters
meansany single character

The “%”is generally more useful, but “_” is needed if the character’s position in the string is important. Figure 12.47 provides a few examples based on Table 12.9. The like operator has priority over logical operators, so “x not like s” means “not (x like s)”.

Figure 12.47. “%” and “_” wildcards for pattern matching with the like operator.


Note that the “%” and “_” are interpreted as wildcard characters only in the context of a like condition. For example, the following query returns the null set. There are no rows that satisfy the condition, since if “=” is used instead of like the expression “A%” is taken literally.

select starsign from Person

where starsign = ‘A%’

As discussed earlier, to include a single quote in a string, we use two single quotes. For example, for the table scheme Actor ( surname, firstname, gender) considered earlier, the following query returns each actor whose surname begins with “D”’: select surname from Actor where surname like ‘D”%’.

To see a few examples where the like operator is quite useful, consider Table 12.10, which indicates which subjects are offered in which semester at a given university. The sample population is small to save space. At this university, subject codes have the following meaning: the first two characters indicate the discipline area (e.g., “CS” denotes Computer Science and “PD” denotes Philosophy) and the third character denotes the year level (e.g., “1” for first year level).

Table 12.10. A relational table showing subjects offered and semesters.


Try to formulate the following queries yourself before peeking at the answers.

List the computer science subjects.

List the first year level subjects.

List subjects higher than first level that are offered in semester 2.

The queries in order are:

select distinct subject from Offering where subject like ‘CS%’

select distinct subject from Offering where subject like ‘__1%’

select distinct subject from Offering

where subject not like ‘__1%’ and semester = 2

Another common example of a string that is often interpreted as having structure is a person name. Usually a person’s surname, as well as either a first name or initials, is recorded (e.g., “Smith, James” or “Smith JB”). Suppose we wish to distinguish between these two parts of a person’s name. One way to implement this is to include two columns, one for the surname and another for the firstname (or initials). The structure is then known to the system, and each of these two parts of the name can be accessed individually (this is especially handy for SQL’s group by facility). Alternatively, we might use just one column to store the whole name and then use the like operator to distinguish the two parts—the name structure then becomes derived rather than stored.

For example, consider Table 12.11, which provides details of players in a mixed doubles tennis match. Try the following queries and then check your answers.

List details of people with surname “Smith”.

List details of people with firstname “James”.

In order, the queries are:

select * from Player where personName like‘smith, %’

select * from Player where personName like‘%, James’

Table 12.11. A relational table concerning tennis players.


Conditions using the like operator may optionally include an escape character. If chosen, it may be used as a lead-in character to have “%” and “_” interpreted literally rather than as wildcards. For example, the query in Figure 12.48 lists starships with an underscore character in their name. Here “” is used as the lead-in character; any character that doesn’t occur in the string being investigated could be used. Some SQL dialects extend the capability of the like operator with additional wildcards. For example, SQL Server uses [ ] to include a range of characters, and [^] to exclude characters.

Figure 12.48. Here “” is used as an escape character to detect an underscore.


Although nulls may be displayed (e.g., as “?”), they cannot be used with the usual comparison operators. Instead, a special postfix Boolean ...is null operator is used. This operator is placed after the name of the column on which it operates. It returns True if the column value is null, and False otherwise. For example, the following query on the Player table returns ‘Smith, Sue’, since her height is unknown:

select personName from Player

where height is null

Neither of the expressions “height = null” nor “height = ‘?‘” is allowed. To specify that a value is not null, use the logical not operator before the column name or the word null. For instance, either query shown here returns the names of the three other players.

select personName from Player where not height is null

select personName from Player where height is not null

Using “col” to denote a column, the BNF syntax for null comparisons in SQL-89 is:

col is [not] null

While the is null operator always returns True or False, other comparison operators return unknown when one of the arguments is the null value. Suppose we want the names of the players who are between 170 and 175 cm tall (inclusive). The following query returns just two names: ‘Smifhers, James’, ‘James, Susan’.

select personName from Player
   where height between 170 and 175

Although Sue Smith’s height might be in this range in the real world, she is excluded from this result. The system does not know her height, so will not evaluate the condition to True in her case. Suppose now we want the names of those players whose height is not in the range 170..175. The following query returns just one name: ‘Smith, James’.

select personName from Player

where height not between 170 and 175

Notice that Sue Smith is excluded from this result as well. As far as the system is concerned, in her case the truth value of the condition “height between 170 and 175” is neither true nor false; rather it is unknown.

A row is included in a result only if it satisfies the search condition, i.e., the condition is (known by the system to be) True for that row. If the condition is either False or Unknown, the row is excluded from the result. As an extreme example, the following query returns all the players except Sue Smith.

select personName from Player

where height = 170 or height <> 170

For Sue Smith’s row, each comparison in this condition evaluates to unknown, and applying the or operator to unknowns also gives an unknown. The truth value of any condition is evaluated according to a three-valued logic, as set out in Figure 12.49. Here “1”, “0”, and “?” denote the values “True”, “False”, and “Unknown”, respectively.

Figure 12.49. Truth tables for three-valued logic (1 = True, 0 = False, ? = Unknown).


Although SQL-89 and SQL-92 allow only one kind of null value, Codd (1990) proposed a four-valued logic in “version 2” of the relational model to allow for two kinds of unknown (applicable and inapplicable), but this has not received popular approval.

A summary of the SQL-89 syntax of the four operators introduced in this section is shown. Here “expn” denotes an expression such as a column name or constant (or a combination of these connected by arithmetic operators +, -, *, / ... see later).

expn[not] in (constant-list)

expn [not] between expn and expn

char-col[not] like quoted-string[escape quoted-char]

col is [not] null

In addition, SQL-89 allows the in operator to be used with subqueries, and the use of quantified comparisons and the exists quantifier in search conditions (see later). SQL-92 (and later) significantly extended the kinds of expressions and operators allowed in search conditions.

For example, SQL-92 allows a row value expression (e.g., a tuple of values) as the left argument of any comparator (e.g., =, in) and includes unique, match, and overlaps predicates. Unique is used to check whether each null-free row returned by a subquery is unique, match is used to check whether a row matches a row returned by a subquery, and overlaps is used to test whether two datetime periods overlap.

SQL: 1999 goes further, adding similar, distinct, and type predicates for use in search conditions. Similar allows character strings to be compared by means of a regular expression, distinct tests whether two row values are distinct, and type tests whether a user-defined type value expression conforms to a user-defined type specification. Dialects of SQL differ in their support for these standard predicates. However, all SQL dialects now support the SQL-89 syntax discussed in this section.

Exercise 12.7

  1. The table contains data about computer languages. Phrase the following queries in SQL.

    1. Which languages were released in the years 1959, 1975, or 1979?

    2. Which languages were released in the period 1959- 1979?

    3. Which languages do not have their release year recorded?

    4. List the name and release year of any Goodo language.

    5. List the name of any language ending with “OL”.

    6. List the name and release year of any Pascal language.

    7. List language names that are five characters long.

    8. List language names with ‘o’ as the second character.

    9. List the name of any language containing the letter “O” or “o”.

    10. List the name and release year of languages not starting with “P” that were released after 1959 and before 1979.

    11. List language names that either are six characters long and have “a” as the last or second last character or are five characters long and end with “OL”.

  2. Suppose that in later years computer languages with the following titles were released: Modula_3; Oberon_2; Ada%93; COBOL%93. Write an SQL query to list all languages whose title includes an underscore or a percentage sign.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.63.5