Chapter 5, “SELECT
Statement: Common Elements,” described data types as INTEGER
, CHAR
, and DATE
. These are the so-called base data types and are an integral part of SQL. They offer certain features, and we can apply predefined operations to them. For example, we can perform calculations on values with the INTEGER
data type, and we can apply operators such as +
and –
to them. However, the base data types of SQL are very elementary. Some users need much more complex and specialized data types. In an environment in which geographical data is stored, a data type as a two-dimensional (2D) coordinate, for example, would be very useful. Similarly, the data type color could be useful in a paint factory. Of course, we also need to have the operators for such data type. For the data type 2D coordinate, we would like to have operators as “calculate the distance between two coordinates” and, for color, “mix two colors.”
More SQL products allow users to define their own data types with related operators. To make the distinction clear, these are called user-defined data types. The SQL products and also the SQL3 standard support user-defined data types.
SQL supports several types of user-defined data types. In this chapter, we describe, among other things, the distinct, the opaque, and the named row types. For each data type, we explain the possibilities of the SQL products in this field and discuss what has been defined in the SQL3 standard. Creating user-defined functions and operators also is explained.
User-defined data types must be created, of course. For this, a special CREATE
statement exists, just as there exist statements for creating tables, views, and synonyms.
Example 33.1. Create the data types called PAYMENTNO
, PLAYERNO
, and MONEYAMOUNT
, and use them in the CREATE TABLE
statement for the PENALTIES
table.
CREATE TYPE PAYMENTNO AS INTEGER CREATE TYPE PLAYERNO AS INTEGER CREATE TYPE MONEYAMOUNT AS DECIMAL(7,2) CREATE TABLE PENALTIES (PAYMENTNO PAYMENTNO NOT NULL PRIMARY KEY, PLAYERNO PLAYERNO, PAY_DATE DATE, AMOUNT MONEYAMOUNT)
Explanation: The user-defined data types are used in positions where base data types usually occur. This is always allowed. Wherever a base data types can be used, a user-defined data type can also be used. The example also shows that column names and names of data types can be the same.
User-defined data types have many similarities with base data types. One is that a data type has no population or “contents” (a table does, on the other hand). For example, with INSERT
statements, rows are added to a table and the contents are built up. However, INSERT
and other statements cannot be executed on a data type. Data types cannot be manipulated in any way. Therefore, we cannot request all possible values of the INTEGER
or a user-defined data type by using a SELECT
statement. One could say that a data type has a static, virtual content. This virtual content consists of all values that might occur in the underlying data type. Therefore, all numeric values between -9,999,999.99
and 999,999.99
are allowed in the MONEYAMOUNT
data type. An SQL data type is, in fact, comparable to a type in Pascal or a class in Java; the data type describes possible values.
From the previous CREATE TYPE
statements, it is obvious that a user-defined data type depends upon a base data type. Moreover, user-defined data types can also refer to each other.
Several types of user-defined data types exist. Those created earlier are called distinct data types. A distinct data type is directly or indirectly (through another distinct data type) defined upon an existing base data type. In the next sections, the other types are described.
One of the great advantages of working with user-defined data types is that apples cannot be compared to pears. The following SELECT
statement was allowed with the original definition of the PENALTIES
table, but not any longer:
SELECT * FROM PENALTIES WHERE PAYMENTNO > AMOUNT
This was allowed because both columns were numeric. Now that the AMOUNT
column is defined on the data type MONEYAMOUNT
, it can be compared only to columns defined on that same data type. This might sound like a restriction but is actually an advantage. The condition in the SELECT
statement was an odd question anyhow. In other words, the advantage of working with user-defined data types is that senseless statements are rejected. In the world of programming languages, this is called strong typing. Languages such as Algol, Pascal, and Java have supported this concept from the beginning. Note that it is possible to compare values of different data types, but then we have to indicate that clearly. We return to this topic later.
It is also easy to remove data types:
DROP TYPE MONEYAMOUNT
What happens if a data type is removed while columns are defined on it? The answer to this question is, again, that it depends on the product. Some products allow a data type to be removed only if there are no columns or other user-defined data types defined on it. Other products do allow the removal and replace the user-defined data type of the column with the underlying data type. In other words, the specification of the dropped data type is copied to all the columns with that data type.
In the literature on the relational model, instead of the term data type, the term domain is used regularly.
Usually, data types have an owner. The person who creates them is the owner of that data type. Other users can use the data type in their own CREATE TABLE
statements, but they must be granted permission explicitly. A special version of the GRANT
statement is introduced for granting this privilege.
Example 33.3. Give JIM
permission to use the MONEYAMOUNT
data type.
GRANT USAGE ON TYPE MONEYAMOUNT TO JIM
Explanation: USAGE
is the new form. After this GRANT
statement is executed, JIM
can define tables with columns based on this new data type.
Note that some products do not use the word USAGE
for this; they use the word EXECUTE
, just as for stored procedures. The meaning and effect are identical.
Of course, this statement also has a counterpart:
REVOKE USAGE ON TYPE MONEYAMOUNT TO JIM
But what happens when the privilege is revoked after JIM
has used the data type in a CREATE TABLE
statement? The effect of this statement also depends on the product, but most products employ the following rule: The right to use a data type can be revoked only if the user has not used the data type yet.
In Section 33.2, we indicated that the use of user-defined data types involves strong typing, but what if we nevertheless want to compare apples to pears? To this end, we have to change the data type of the values. For this, we use an explicit form of casting, as discussed in Section 5.11, in Chapter 5.
For each new data type, SQL automatically creates two new scalar functions for casting. One function transforms values of the user-defined data type to values of the underlying base data type (this function carries the name of the base data type); the other works the other way around and carries the name of the user-defined data type. These functions are called destructor and constructor, respectively. For the data type MONEYAMOUNT
, the destructor is called DECIMAL
and the constructor MONEYAMOUNT
. Note that in object-oriented programming languages, these two terms are used to remove and create objects, respectively.
Example 33.4. Find the payment numbers of the penalties of which the penalty amount is greater than $50.
Explanation: In the first SELECT
statement, the value 50
(which is probably a “normal” number for the INTEGER
data type) is transformed into a money amount. Then, it can be compared to comparable values in the AMOUNT
column. Thus, the constructor MONEYAMOUNT
constructs money amounts out of numeric values. The second statement shows that money amounts can be converted into “normal” numbers by using the destructor called DECIMAL
. The result of both statements is, of course, the same.
Example 33.5. Find the payment numbers of the penalties of which the player number is greater than the penalty amount.
SELECT PAYMENTNO FROM PENALTIES WHERE INTEGER(PLAYERNO) > INTEGER(PAYMENTNO)
Explanation: Because the PLAYERNO
and the PAYMENTNO
data types are created on the same base data type, which is INTEGER
, they both have a destructor called INTEGER
. In other words, now there are two functions with the same name, but they work on different data types. This does not cause any problems within SQL, which can keep the two functions apart because the parameters of the functions are different with respect to their data types. This concept, in which different functions carry the same name, is called overloading. The function name INTEGER
is overloaded in this example.
Note that to change the data type of a value to compare it to values that have another data type is sometimes called semantic override.
Casting of values is also important when you enter new values with INSERT
and UPDATE
statements. Now that three columns in the PENALTIES
table have a user-defined data type, we can no longer put simple numeric values in this column. We are forced to use casting with the INSERT
statement now.
Just as in any programming language, SQL supports operators such as +
, -
, *
, and /
. We described them in Chapter 5. A few general remarks about these operators:
In theory, these operators are not required. For operators such as +
and *
, the functions ADD_UP
and MULTIPLY
could have been created. These operators have been added, however, to make things easier.
As stated, every base data type has a number of possible operations. For example, with the numeric data types, we can employ operations such as add, multiply, and subtract so that we can add a couple of months to the date data type and create a new date.
Overloading of functions is described in Section 33.4. Overloading of operators also exists: Whether we use the +
for two numbers or for two alphanumeric values leads to completely different results. Depending on the data types of the values, the two numbers are added or the alphanumeric values are concatenated.
Some SQL products allow operators to be created for user-defined data types. In principle, these are the operations that apply to the underlying data type, but we can also define our own operations. SQL products enable you to do this only for scalar functions.
Let us continue with the discussion of the data type MONEYAMOUNT
. Imagine that there are two columns, AMOUNT1
and AMOUNT2
, that are both defined on this data type MONEYAMOUNT
and that we want to add up. Because MONEYAMOUNT
is not a normal numeric value, we cannot use the operators +
and –
. The following expression would no longer be allowed:
AMOUNT1 + AMOUNT2
This must now be done with the following expression:
DECIMAL(AMOUNT1) + DECIMAL(AMOUNT2)
We can solve this more elegantly by also defining the +
symbol for values of the MONEYAMOUNT
data type.
CREATE FUNCTION "+" (MONEYAMOUNT, MONEYAMOUNT) RETURNS MONEYAMOUNT SOURCE "+" (DECIMAL(), DECIMAL())
Explanation: The +
operator is defined once again, and again it is overloaded. It makes the expression AMOUNT1 + AMOUNT2
legal.
Imagine that the data type COLOR
and the function MIX
(to mix two colors) have been defined. Next, the +
operator can be created, for example, as an operator to mix two colors.
CREATE FUNCTION "+" (COLOR, COLOR) RETURNS COLOR SOURCE MIX (COLOR, COLOR)
The capability to define user-defined operators does not increase the functionality of SQL, but it makes it easier to formulate certain statements.
A distinct data type is based on one base data type and inherits all the features of that base data type. In addition, some products enable you to define completely new data types that are not dependent on a base data type. These are called opaque data types. Opaque means “nontransparent.” You could say that an opaque data type is a user-defined base data type.
Opaque data types are required when it is too complex to define them with the help of a base data type. For example, if we want to define the data type 2D coordinate, we must store two numbers somehow: the X and the Y coordinates. This does not work if we use only base data types. However, we can do it with opaque data types, as shown by the next example.
Example 33.7. Create the data type TWODIM
to store two-dimensional coordinates.
CREATE TYPE TWODIM (INTERNALLENGTH = 4)
Explanation: What is clearly noticeable is that there is indeed no base data type used in the CREATE TYPE
statement. The only thing that is registered is how much space one value of this type will occupy on disk—namely, 4 bytes, which has been selected because, for the sake of convenience, we assume that a coordinate consists of two whole numbers.
However, before this new data type can be used in a CREATE TABLE
statement, we have to define a number of functions. We must create, for example, a function that converts a value, entered by the user, to something that is stored on hard disk and a function that works the other way around. This is not required for base data types. If we use the CHAR
data type, we assume that these functions already exist. Now, we must create them ourselves. We do not go more deeply into this topic because it depends strongly on the product. We simply note that, in addition to the required functions, other functions can be defined to increase the functionality. In most cases, these are external functions.
The third user-defined data type is the named row data type. With it, we can group values logically belonging to each other as one unit. For example, all values belonging to an address are grouped.
Example 33.8. Create the named row data type called ADDRESS
and use it in a CREATE TABLE
statement.
CREATE TYPE ADDRESS AS (STREET CHAR(15) NOT NULL, HOUSENO CHAR(4), POSTCODE CHAR(6), TOWN CHAR(10) NOT NULL) CREATE TABLE PLAYERS (PLAYERNO INTEGER PRIMARY KEY, NAME CHAR(15), : : RESIDENCE ADDRESS, PHONENO CHAR(13), LEAGUENO CHAR(4))
Explanation. Instead of having to define four columns in the CREATE TABLE
statement, only one will do: RESIDENCE
. That means that in one row in the column RESIDENCE
, not one value, but a row with four values, is stored. This row of four values has a name (or, in other words, is named) of ADDRESS
, which explains the term named row. The column RESIDENCE
is a composite column. For each column belonging to a named row data type, a NOT NULL
specification can be included.
Of course, you can use a data type several times in the same CREATE TABLE
statement, for example:
CREATE TABLE PLAYERS (PLAYERNO INTEGER PRIMARY KEY, : : RESIDENCE ADDRESS, MAILING_ADDRESS ADDRESS, HOLIDAY_ADDRESS ADDRESS, PHONENO CHAR(13), LEAGUENO CHAR(4))
Working with composite columns affects the formulations of SELECT
and other statements. We illustrate this with some examples.
Example 33.9. Get the numbers and complete addresses of the players resident in Stratford.
SELECT PLAYERNO, RESIDENCE FROM PLAYERS WHERE RESIDENCE.TOWN = 'Stratford'
The result is:
PLAYERNO <=============== RESIDENCE ===========> STREET HOUSENO POSTCODE TOWN -------- -------------- ------- -------- --------- 6 Haseltine Lane 80 1234KK Stratford 83 Magdalene Road 16A 1812UP Stratford 2 Stoney Road 43 3575NH Stratford 7 Edgecombe Way 39 9758VB Stratford 57 Edgecombe Way 16 4377CB Stratford 39 Eaton Square 78 9629CD Stratford 100 Haseltine Lane 80 1234KK Stratford
Explanation. In the SELECT
clause, only one column has to be specified instead of four. Obviously, the result consists of five columns. The notation RESIDENCE.TOWN
is new. This point notation indicates that only a part of the address is requested.
Example 33.10. Get the numbers of the players living at the same address as player 6.
SELECT OTHERS.PLAYERNO FROM PLAYERS AS P6, PLAYERS AS OTHERS WHERE P6.RESIDENCE = OTHERS.RESIDENCE AND P6.PLAYERNO = 6
Explanation: Instead of a join condition on four columns (STREET
, HOUSENO
, TOWN
, and POSTCODE
), one simple join condition, in which the composite column is used, is sufficient.
Casting of values is also important with named row data types. We give you an example of a SELECT
and an INSERT
statement.
Example 33.11. Get the number and name of the player living at the address 39 Edgecombe Way, Stratford, with postcode 9758VB.
SELECT PLAYERNO, NAME FROM PLAYERS WHERE RESIDENCE = ADDRESS('Edgecombe Way', 39, '9758VB', 'Stratford')
Explanation: In this example, we can see clearly how the four values are cast into one ADDRESS
value so that they can be compared with the column RESIDENCE
.
Example 33.12. Enter a new player.
INSERT INTO PLAYERS (PLAYERNO, NAME, ..., ADDRESS, PHONENO, LEAGUENO) VALUES (6, 'Parmenter', ..., ADDRESS('Haseltine Lane', 80, '1234KK', 'Stratford'), '070-476537', 8467)
Named row data types are usually defined on base and distinct data types, but they can also be “nested.” An example is given next. First, the data type POSTCODE
is defined, consisting of two components: a part of four digits and a part of two letters. Next, this new named row data type is used in the definition of the ADDRESS
data type.
CREATE TYPE POSTCODE AS (DIGITS CHAR(4), LETTERS CHAR(2)) CREATE TYPE ADDRESS AS (STREET CHAR(15) NOT NULL, HOUSENO CHAR(4), POSTCODE POSTCODE, TOWN CHAR(10) NOT NULL)
Example 33.13. Get the numbers and the full addresses of the players resident in postcode area 2501.
SELECT PLAYERNO, RESIDENCE FROM PLAYERS WHERE RESIDENCE.POSTCODE.DIGITS = '2501'
Explanation: In the condition, two values are grouped into one value with a POSTCODE
data type. A casting function is used for this.
In addition to the named row data type, some SQL products support the unnamed row data type. This data type also puts values together, but this group does not get a separate name.
CREATE TABLE PLAYERS (PLAYERNO INTEGER PRIMARY KEY, NAME CHAR(15), : : RESIDENCE ROW (STREET CHAR(15) NOT NULL, HOUSENO CHAR(4), POSTCODE CHAR(6), TOWN CHAR(10) NOT NULL), PHONENO CHAR(13), LEAGUENO CHAR(4))
Explanation: We can see that the four values are grouped together here. However, no data type is defined explicitly. The effect of an unnamed row data type on SELECT
and other statements is the same as that of the named row data type. The difference, however, is that the specification cannot be reused in several places. If there is also a MAILING_ADDRESS
column, we must define the four subcolumns once again.
For casting of values, the word ROW
is used:
INSERT INTO PLAYERS (PLAYERNO, NAME, ..., ADDRESS, PHONENO, LEAGUENO) VALUES (6, 'Parmenter', ..., ROW('Haseltine Lane', 80, '1234KK', 'Stratford'), '070-476537', 8467)
What is wrong in the following SELECT RESIDENCE FROM PLAYERS WHERE RESIDENCE LIKE '12%' | |
Create the data type |
So far, we have used the named row data type only to specify columns, but this data type can also be used to assign a data type to a table. The result is that it is no longer necessary to specify the columns and their data types explicitly; instead, the columns of the named row data type must form the columns of the table.
Example 33.15. Create a type for the PLAYERS
table.
CREATE TYPE T_PLAYERS AS (PLAYERNO INTEGER NOT NULL, NAME CHAR(15) NOT NULL, INITIALS CHAR(3) NOT NULL, BIRTH_DATE DATE, SEX CHAR(1) NOT NULL, JOINED SMALLINT NOT NULL, STREET CHAR(15) NOT NULL, HOUSENO CHAR(4), POSTCODE CHAR(6), TOWN CHAR(10) NOT NULL, PHONENO CHAR(13), LEAGUENO CHAR(4)) CREATE TABLE PLAYERS OF T_PLAYERS (PRIMARY KEY PLAYERNO)
Explanation: With the specification OF T_PLAYERS
in the CREATE TABLE
statement, we indicate that all the columns of the PLAYERS
table are of that data type. Nevertheless, certain constraints must still be specified, and that explains the specification of the primary key. The NOT NULL
integrity constraint is the only rule that can be included within the CREATE TYPE
statement. A table that is defined in this way is called a typed table. Whether a table is typed or nontyped has no impact on SELECT
and update statements.
The advantage of typed tables is that tables with the same structure can be defined in a very simple way. Imagine that there is another PLAYERS
table consisting of players who used to be members of the tennis club. This table probably has the same columns, so it should now be easy to create the table:
CREATE TABLE OLD_PLAYERS OF T_PLAYERS (PRIMARY KEY PLAYERNO)
Some SQL products enable you to specify integrity constraints on a data type. These integrity constraints restrict the permitted values of the data type and, thus, the populations of the columns defined on that data type.
Example 33.16. Define the data type NUMBER_OF_SETS
and specify that only the values 1
, 2
, and 3
are legal.
CREATE TYPE NUBER_OF_SETS AS SMALLINT CHECK (VALUE IN (0, 1, 2, 3)) CREATE TABLE MATCHES (MATCHNO INTEGER PRIMARY KEY, TEAMNO INTEGER NOT NULL, PLAYERNO INTEGER NOT NULL, WON NUMBER_OF_SETS NOT NULL, LOST NUMBER_OF_SETS NOT NULL)
Explanation: In the CREATE TYPE
statement, a check integrity constraint is specified. This constraint indicates, with a condition, the legal values. Values are legal when they satisfy the condition. The reserved word VALUE
stands for a possible value of that specific data type. Any simple condition can be used here, which means that comparison operators AND
, OR
, NOT
, BETWEEN
, IN
, LIKE
, and IS NULL
all can be used. Subqueries are not allowed, however.
Now the advantage is that if the integrity constraint for NUMBER_OF_SETS
changes, this has to be carried out only in one place.
Example 33.17. Change the data type NUMBER_OF_SETS
so that the value 4
is also permitted.
ALTER TYPE NUMBER_OF_SETS AS SMALLINT CHECK (VALUE BETWEEN 0 AND 4)
When a condition is changed, a problem could arise if we make the condition more restrictive. Imagine that NUMBER_OF_SETS
is defined as only the values 0
, 1
, and 2
. What happens if the columns defined on this data type already have a value that is beyond this range? Products solve this by not allowing such a change of the data type to occur. First, the columns must be adjusted.
Primary keys, foreign keys, and indexes can be created on columns with user-defined data types. For the named row data types, they can be defined on the full value or on a part of it.
Example 33.18. Define an index on the column RESIDENCE
in the PLAYERS
table.
CREATE INDEX I_RESIDENCE ON PLAYERS(RESIDENCE)
Example 33.19. Define an index on only the POSTCODE
part of the column RESIDENCE
in the PLAYERS
table.
CREATE INDEX I_RESIDENCE ON PLAYERS(RESIDENCE.POSTCODE)
The only exception is when indexes on opaque data types must be defined. Each product offers very different features here.
3.147.65.247