Chapter 33. User-Defined Data Types, Functions, and Operators

Introduction

Chapter 5, “SELECT Statement: Common Elements,” described data types as INTEGER, CHAR, and DATE. These are the so-called base data types and are an integral part of SQL. They offer certain features, and we can apply predefined operations to them. For example, we can perform calculations on values with the INTEGER data type, and we can apply operators such as + and to them. However, the base data types of SQL are very elementary. Some users need much more complex and specialized data types. In an environment in which geographical data is stored, a data type as a two-dimensional (2D) coordinate, for example, would be very useful. Similarly, the data type color could be useful in a paint factory. Of course, we also need to have the operators for such data type. For the data type 2D coordinate, we would like to have operators as “calculate the distance between two coordinates” and, for color, “mix two colors.”

More SQL products allow users to define their own data types with related operators. To make the distinction clear, these are called user-defined data types. The SQL products and also the SQL3 standard support user-defined data types.

SQL supports several types of user-defined data types. In this chapter, we describe, among other things, the distinct, the opaque, and the named row types. For each data type, we explain the possibilities of the SQL products in this field and discuss what has been defined in the SQL3 standard. Creating user-defined functions and operators also is explained.

Creating User-Defined Data Types

User-defined data types must be created, of course. For this, a special CREATE statement exists, just as there exist statements for creating tables, views, and synonyms.

Example 33.1. Create the data types called PAYMENTNO, PLAYERNO, and MONEYAMOUNT, and use them in the CREATE TABLE statement for the PENALTIES table.

CREATE TYPE PAYMENTNO AS INTEGER

CREATE TYPE PLAYERNO AS INTEGER

CREATE TYPE MONEYAMOUNT AS DECIMAL(7,2)

CREATE   TABLE PENALTIES
        (PAYMENTNO   PAYMENTNO NOT NULL PRIMARY KEY,
         PLAYERNO    PLAYERNO,
         PAY_DATE    DATE,
         AMOUNT      MONEYAMOUNT)

Explanation: The user-defined data types are used in positions where base data types usually occur. This is always allowed. Wherever a base data types can be used, a user-defined data type can also be used. The example also shows that column names and names of data types can be the same.

User-defined data types have many similarities with base data types. One is that a data type has no population or “contents” (a table does, on the other hand). For example, with INSERT statements, rows are added to a table and the contents are built up. However, INSERT and other statements cannot be executed on a data type. Data types cannot be manipulated in any way. Therefore, we cannot request all possible values of the INTEGER or a user-defined data type by using a SELECT statement. One could say that a data type has a static, virtual content. This virtual content consists of all values that might occur in the underlying data type. Therefore, all numeric values between -9,999,999.99 and 999,999.99 are allowed in the MONEYAMOUNT data type. An SQL data type is, in fact, comparable to a type in Pascal or a class in Java; the data type describes possible values.

From the previous CREATE TYPE statements, it is obvious that a user-defined data type depends upon a base data type. Moreover, user-defined data types can also refer to each other.

Example 33.2. Create the data type SMALL_MONEYAMOUNT.

CREATE TYPE SMALL_MONEYAMOUNT AS MONEYAMOUNT

Several types of user-defined data types exist. Those created earlier are called distinct data types. A distinct data type is directly or indirectly (through another distinct data type) defined upon an existing base data type. In the next sections, the other types are described.

One of the great advantages of working with user-defined data types is that apples cannot be compared to pears. The following SELECT statement was allowed with the original definition of the PENALTIES table, but not any longer:

SELECT  *
FROM    PENALTIES
WHERE   PAYMENTNO > AMOUNT

This was allowed because both columns were numeric. Now that the AMOUNT column is defined on the data type MONEYAMOUNT, it can be compared only to columns defined on that same data type. This might sound like a restriction but is actually an advantage. The condition in the SELECT statement was an odd question anyhow. In other words, the advantage of working with user-defined data types is that senseless statements are rejected. In the world of programming languages, this is called strong typing. Languages such as Algol, Pascal, and Java have supported this concept from the beginning. Note that it is possible to compare values of different data types, but then we have to indicate that clearly. We return to this topic later.

It is also easy to remove data types:

DROP TYPE MONEYAMOUNT

What happens if a data type is removed while columns are defined on it? The answer to this question is, again, that it depends on the product. Some products allow a data type to be removed only if there are no columns or other user-defined data types defined on it. Other products do allow the removal and replace the user-defined data type of the column with the underlying data type. In other words, the specification of the dropped data type is copied to all the columns with that data type.

In the literature on the relational model, instead of the term data type, the term domain is used regularly.

Exercise 33.1:

Create the data type NUMBER_OF_SETS and use it for the columns WON and LOST of the MATCHES table.

Access to Data Types

Usually, data types have an owner. The person who creates them is the owner of that data type. Other users can use the data type in their own CREATE TABLE statements, but they must be granted permission explicitly. A special version of the GRANT statement is introduced for granting this privilege.

Example 33.3. Give JIM permission to use the MONEYAMOUNT data type.

GRANT  USAGE
ON     TYPE MONEYAMOUNT
TO     JIM

Explanation: USAGE is the new form. After this GRANT statement is executed, JIM can define tables with columns based on this new data type.

Note that some products do not use the word USAGE for this; they use the word EXECUTE, just as for stored procedures. The meaning and effect are identical.

Of course, this statement also has a counterpart:

REVOKE   USAGE
ON       TYPE MONEYAMOUNT
TO       JIM

But what happens when the privilege is revoked after JIM has used the data type in a CREATE TABLE statement? The effect of this statement also depends on the product, but most products employ the following rule: The right to use a data type can be revoked only if the user has not used the data type yet.

Casting of Values

In Section 33.2, we indicated that the use of user-defined data types involves strong typing, but what if we nevertheless want to compare apples to pears? To this end, we have to change the data type of the values. For this, we use an explicit form of casting, as discussed in Section 5.11, in Chapter 5.

For each new data type, SQL automatically creates two new scalar functions for casting. One function transforms values of the user-defined data type to values of the underlying base data type (this function carries the name of the base data type); the other works the other way around and carries the name of the user-defined data type. These functions are called destructor and constructor, respectively. For the data type MONEYAMOUNT, the destructor is called DECIMAL and the constructor MONEYAMOUNT. Note that in object-oriented programming languages, these two terms are used to remove and create objects, respectively.

Example 33.4. Find the payment numbers of the penalties of which the penalty amount is greater than $50.

There are two equivalent formulations for this:

SELECT   PAYMENTNO
FROM     PENALTIES
WHERE    AMOUNT > MONEYAMOUNT(50)

and

SELECT   PAYMENTNO
FROM     PENALTIES
WHERE    DECIMAL(AMOUNT) > 50

Explanation: In the first SELECT statement, the value 50 (which is probably a “normal” number for the INTEGER data type) is transformed into a money amount. Then, it can be compared to comparable values in the AMOUNT column. Thus, the constructor MONEYAMOUNT constructs money amounts out of numeric values. The second statement shows that money amounts can be converted into “normal” numbers by using the destructor called DECIMAL. The result of both statements is, of course, the same.

Example 33.5. Find the payment numbers of the penalties of which the player number is greater than the penalty amount.

SELECT   PAYMENTNO
FROM     PENALTIES
WHERE    INTEGER(PLAYERNO) > INTEGER(PAYMENTNO)

Explanation: Because the PLAYERNO and the PAYMENTNO data types are created on the same base data type, which is INTEGER, they both have a destructor called INTEGER. In other words, now there are two functions with the same name, but they work on different data types. This does not cause any problems within SQL, which can keep the two functions apart because the parameters of the functions are different with respect to their data types. This concept, in which different functions carry the same name, is called overloading. The function name INTEGER is overloaded in this example.

Note that to change the data type of a value to compare it to values that have another data type is sometimes called semantic override.

Casting of values is also important when you enter new values with INSERT and UPDATE statements. Now that three columns in the PENALTIES table have a user-defined data type, we can no longer put simple numeric values in this column. We are forced to use casting with the INSERT statement now.

Example 33.6. Add a new penalty.

INSERT INTO PENALTIES (PAYMENTNO, PLAYERNO, PAY_DATE,
                       AMOUNT)
VALUES                (PAYMENTNO(12), PLAYERNO(6),
                      '1980-12-08', MONEYAMOUNT(100.00))

Creating User-Defined Operators

Just as in any programming language, SQL supports operators such as +, -, *, and /. We described them in Chapter 5. A few general remarks about these operators:

  • In theory, these operators are not required. For operators such as + and *, the functions ADD_UP and MULTIPLY could have been created. These operators have been added, however, to make things easier.

  • As stated, every base data type has a number of possible operations. For example, with the numeric data types, we can employ operations such as add, multiply, and subtract so that we can add a couple of months to the date data type and create a new date.

  • Overloading of functions is described in Section 33.4. Overloading of operators also exists: Whether we use the + for two numbers or for two alphanumeric values leads to completely different results. Depending on the data types of the values, the two numbers are added or the alphanumeric values are concatenated.

Some SQL products allow operators to be created for user-defined data types. In principle, these are the operations that apply to the underlying data type, but we can also define our own operations. SQL products enable you to do this only for scalar functions.

Let us continue with the discussion of the data type MONEYAMOUNT. Imagine that there are two columns, AMOUNT1 and AMOUNT2, that are both defined on this data type MONEYAMOUNT and that we want to add up. Because MONEYAMOUNT is not a normal numeric value, we cannot use the operators + and . The following expression would no longer be allowed:

AMOUNT1 + AMOUNT2

This must now be done with the following expression:

DECIMAL(AMOUNT1) + DECIMAL(AMOUNT2)

We can solve this more elegantly by also defining the + symbol for values of the MONEYAMOUNT data type.

CREATE FUNCTION "+" (MONEYAMOUNT, MONEYAMOUNT)
   RETURNS MONEYAMOUNT
   SOURCE "+" (DECIMAL(), DECIMAL())

Explanation: The + operator is defined once again, and again it is overloaded. It makes the expression AMOUNT1 + AMOUNT2 legal.

Imagine that the data type COLOR and the function MIX (to mix two colors) have been defined. Next, the + operator can be created, for example, as an operator to mix two colors.

CREATE FUNCTION "+" (COLOR, COLOR)
   RETURNS COLOR
   SOURCE MIX (COLOR, COLOR)

The capability to define user-defined operators does not increase the functionality of SQL, but it makes it easier to formulate certain statements.

Opaque Data Type

A distinct data type is based on one base data type and inherits all the features of that base data type. In addition, some products enable you to define completely new data types that are not dependent on a base data type. These are called opaque data types. Opaque means “nontransparent.” You could say that an opaque data type is a user-defined base data type.

Opaque data types are required when it is too complex to define them with the help of a base data type. For example, if we want to define the data type 2D coordinate, we must store two numbers somehow: the X and the Y coordinates. This does not work if we use only base data types. However, we can do it with opaque data types, as shown by the next example.

Example 33.7. Create the data type TWODIM to store two-dimensional coordinates.

CREATE TYPE TWODIM
      (INTERNALLENGTH = 4)

Explanation: What is clearly noticeable is that there is indeed no base data type used in the CREATE TYPE statement. The only thing that is registered is how much space one value of this type will occupy on disk—namely, 4 bytes, which has been selected because, for the sake of convenience, we assume that a coordinate consists of two whole numbers.

However, before this new data type can be used in a CREATE TABLE statement, we have to define a number of functions. We must create, for example, a function that converts a value, entered by the user, to something that is stored on hard disk and a function that works the other way around. This is not required for base data types. If we use the CHAR data type, we assume that these functions already exist. Now, we must create them ourselves. We do not go more deeply into this topic because it depends strongly on the product. We simply note that, in addition to the required functions, other functions can be defined to increase the functionality. In most cases, these are external functions.

Named Row Data Type

The third user-defined data type is the named row data type. With it, we can group values logically belonging to each other as one unit. For example, all values belonging to an address are grouped.

Example 33.8. Create the named row data type called ADDRESS and use it in a CREATE TABLE statement.

CREATE   TYPE ADDRESS AS
        (STREET      CHAR(15) NOT NULL,
         HOUSENO     CHAR(4),
         POSTCODE    CHAR(6),
         TOWN        CHAR(10) NOT NULL)

CREATE   TABLE PLAYERS
        (PLAYERNO    INTEGER PRIMARY KEY,
         NAME        CHAR(15),
         :           :
         RESIDENCE   ADDRESS,
         PHONENO     CHAR(13),
         LEAGUENO    CHAR(4))

ExplanationInstead of having to define four columns in the CREATE TABLE statement, only one will do: RESIDENCE. That means that in one row in the column RESIDENCE, not one value, but a row with four values, is stored. This row of four values has a name (or, in other words, is named) of ADDRESS, which explains the term named row. The column RESIDENCE is a composite column. For each column belonging to a named row data type, a NOT NULL specification can be included.

Of course, you can use a data type several times in the same CREATE TABLE statement, for example:

CREATE   TABLE PLAYERS
        (PLAYERNO          INTEGER PRIMARY KEY,
         :                 :
         RESIDENCE         ADDRESS,
         MAILING_ADDRESS   ADDRESS,
         HOLIDAY_ADDRESS   ADDRESS,
         PHONENO           CHAR(13),
         LEAGUENO          CHAR(4))

Working with composite columns affects the formulations of SELECT and other statements. We illustrate this with some examples.

Example 33.9. Get the numbers and complete addresses of the players resident in Stratford.

SELECT   PLAYERNO, RESIDENCE
FROM     PLAYERS
WHERE    RESIDENCE.TOWN = 'Stratford'

The result is:

PLAYERNO  <=============== RESIDENCE ===========>
          STREET          HOUSENO  POSTCODE  TOWN
--------  --------------  -------  --------  ---------
       6  Haseltine Lane  80       1234KK    Stratford
      83  Magdalene Road  16A      1812UP    Stratford
       2  Stoney Road     43       3575NH    Stratford
       7  Edgecombe Way   39       9758VB    Stratford
      57  Edgecombe Way   16       4377CB    Stratford
      39  Eaton Square    78       9629CD    Stratford
     100  Haseltine Lane  80       1234KK    Stratford

ExplanationIn the SELECT clause, only one column has to be specified instead of four. Obviously, the result consists of five columns. The notation RESIDENCE.TOWN is new. This point notation indicates that only a part of the address is requested.

Example 33.10. Get the numbers of the players living at the same address as player 6.

SELECT   OTHERS.PLAYERNO
FROM     PLAYERS AS P6, PLAYERS AS OTHERS
WHERE    P6.RESIDENCE = OTHERS.RESIDENCE
AND      P6.PLAYERNO = 6

Explanation: Instead of a join condition on four columns (STREET, HOUSENO, TOWN, and POSTCODE), one simple join condition, in which the composite column is used, is sufficient.

Casting of values is also important with named row data types. We give you an example of a SELECT and an INSERT statement.

Example 33.11. Get the number and name of the player living at the address 39 Edgecombe Way, Stratford, with postcode 9758VB.

SELECT   PLAYERNO, NAME
FROM     PLAYERS
WHERE    RESIDENCE =
         ADDRESS('Edgecombe Way', 39, '9758VB',
                 'Stratford')

Explanation: In this example, we can see clearly how the four values are cast into one ADDRESS value so that they can be compared with the column RESIDENCE.

Example 33.12. Enter a new player.

INSERT INTO PLAYERS
       (PLAYERNO, NAME, ..., ADDRESS, PHONENO, LEAGUENO)
VALUES (6, 'Parmenter', ...,
        ADDRESS('Haseltine Lane', 80, '1234KK',
                'Stratford'), '070-476537', 8467)

Named row data types are usually defined on base and distinct data types, but they can also be “nested.” An example is given next. First, the data type POSTCODE is defined, consisting of two components: a part of four digits and a part of two letters. Next, this new named row data type is used in the definition of the ADDRESS data type.

CREATE   TYPE POSTCODE AS
        (DIGITS     CHAR(4),
         LETTERS    CHAR(2))

CREATE   TYPE ADDRESS AS
        (STREET     CHAR(15) NOT NULL,
         HOUSENO    CHAR(4),
         POSTCODE   POSTCODE,
         TOWN       CHAR(10) NOT NULL)

Example 33.13. Get the numbers and the full addresses of the players resident in postcode area 2501.

SELECT   PLAYERNO, RESIDENCE
FROM     PLAYERS
WHERE    RESIDENCE.POSTCODE.DIGITS = '2501'

Example 33.14. Get the numbers and complete addresses of the players with postcode 1234KK.

SELECT   PLAYERNO, RESIDENCE
FROM     PLAYERS
WHERE    RESIDENCE.POSTCODE = POSTCODE('1234', 'KK')

Explanation: In the condition, two values are grouped into one value with a POSTCODE data type. A casting function is used for this.

In addition to the named row data type, some SQL products support the unnamed row data type. This data type also puts values together, but this group does not get a separate name.

CREATE   TABLE PLAYERS
        (PLAYERNO    INTEGER PRIMARY KEY,
         NAME        CHAR(15),
         :           :
         RESIDENCE   ROW (STREET     CHAR(15) NOT NULL,
                          HOUSENO    CHAR(4),
                          POSTCODE   CHAR(6),
                          TOWN       CHAR(10) NOT NULL),
         PHONENO     CHAR(13),
         LEAGUENO    CHAR(4))

Explanation: We can see that the four values are grouped together here. However, no data type is defined explicitly. The effect of an unnamed row data type on SELECT and other statements is the same as that of the named row data type. The difference, however, is that the specification cannot be reused in several places. If there is also a MAILING_ADDRESS column, we must define the four subcolumns once again.

For casting of values, the word ROW is used:

INSERT   INTO PLAYERS
        (PLAYERNO, NAME, ..., ADDRESS, PHONENO, LEAGUENO)
VALUES  (6, 'Parmenter', ...,
         ROW('Haseltine Lane', 80, '1234KK', 'Stratford'),
         '070-476537', 8467)

Exercise 33.2:

What is wrong in the following SELECT statement? (We assume that the situation is the same as in Example 33.14.)

SELECT   RESIDENCE
FROM     PLAYERS
WHERE    RESIDENCE LIKE '12%'

Exercise 33.3:

Create the data type RESULT, consisting of two columns called WON and LOST, and use this new data type at the MATCHES table.

The Typed Table

So far, we have used the named row data type only to specify columns, but this data type can also be used to assign a data type to a table. The result is that it is no longer necessary to specify the columns and their data types explicitly; instead, the columns of the named row data type must form the columns of the table.

Example 33.15. Create a type for the PLAYERS table.

CREATE   TYPE T_PLAYERS AS
        (PLAYERNO     INTEGER NOT NULL,
         NAME         CHAR(15) NOT NULL,
         INITIALS     CHAR(3) NOT NULL,
         BIRTH_DATE   DATE,
         SEX          CHAR(1) NOT NULL,
         JOINED       SMALLINT NOT NULL,
         STREET       CHAR(15) NOT NULL,
         HOUSENO      CHAR(4),
         POSTCODE     CHAR(6),
         TOWN         CHAR(10) NOT NULL,
         PHONENO      CHAR(13),
         LEAGUENO     CHAR(4))

CREATE   TABLE PLAYERS OF T_PLAYERS
        (PRIMARY KEY PLAYERNO)

Explanation: With the specification OF T_PLAYERS in the CREATE TABLE statement, we indicate that all the columns of the PLAYERS table are of that data type. Nevertheless, certain constraints must still be specified, and that explains the specification of the primary key. The NOT NULL integrity constraint is the only rule that can be included within the CREATE TYPE statement. A table that is defined in this way is called a typed table. Whether a table is typed or nontyped has no impact on SELECT and update statements.

The advantage of typed tables is that tables with the same structure can be defined in a very simple way. Imagine that there is another PLAYERS table consisting of players who used to be members of the tennis club. This table probably has the same columns, so it should now be easy to create the table:

CREATE   TABLE OLD_PLAYERS OF T_PLAYERS
        (PRIMARY KEY PLAYERNO)

Integrity Constraints on Data Types

Some SQL products enable you to specify integrity constraints on a data type. These integrity constraints restrict the permitted values of the data type and, thus, the populations of the columns defined on that data type.

Example 33.16. Define the data type NUMBER_OF_SETS and specify that only the values 1, 2, and 3 are legal.

CREATE   TYPE NUBER_OF_SETS AS SMALLINT
         CHECK (VALUE IN (0, 1, 2, 3))

CREATE   TABLE MATCHES
        (MATCHNO     INTEGER PRIMARY KEY,
         TEAMNO      INTEGER NOT NULL,
         PLAYERNO    INTEGER NOT NULL,
         WON         NUMBER_OF_SETS NOT NULL,
         LOST        NUMBER_OF_SETS NOT NULL)

Explanation: In the CREATE TYPE statement, a check integrity constraint is specified. This constraint indicates, with a condition, the legal values. Values are legal when they satisfy the condition. The reserved word VALUE stands for a possible value of that specific data type. Any simple condition can be used here, which means that comparison operators AND, OR, NOT, BETWEEN, IN, LIKE, and IS NULL all can be used. Subqueries are not allowed, however.

Now the advantage is that if the integrity constraint for NUMBER_OF_SETS changes, this has to be carried out only in one place.

Example 33.17. Change the data type NUMBER_OF_SETS so that the value 4 is also permitted.

ALTER   TYPE NUMBER_OF_SETS AS SMALLINT
        CHECK (VALUE BETWEEN 0 AND 4)

When a condition is changed, a problem could arise if we make the condition more restrictive. Imagine that NUMBER_OF_SETS is defined as only the values 0, 1, and 2. What happens if the columns defined on this data type already have a value that is beyond this range? Products solve this by not allowing such a change of the data type to occur. First, the columns must be adjusted.

Keys and Indexes

Primary keys, foreign keys, and indexes can be created on columns with user-defined data types. For the named row data types, they can be defined on the full value or on a part of it.

Example 33.18. Define an index on the column RESIDENCE in the PLAYERS table.

CREATE   INDEX I_RESIDENCE
   ON PLAYERS(RESIDENCE)

Example 33.19. Define an index on only the POSTCODE part of the column RESIDENCE in the PLAYERS table.

CREATE INDEX I_RESIDENCE
   ON PLAYERS(RESIDENCE.POSTCODE)

The only exception is when indexes on opaque data types must be defined. Each product offers very different features here.

Answers

33.1

CREATE TYPE NUMBER_OF_SETS AS TINYINT

CREATE TABLE MATCHES
     ( MATCHNO     INTEGER NOT NULL PRIMARY KEY,
       TEAMNO      INTEGER NOT NULL,
       PLAYERNO    INTEGER NOT NULL,
       WON         NUMBER_OF_SETS NOT NULL,
       LOST        NUMBER_OF_SETS NOT NULL)

33.2

The FROM clause of the statement is correct, but the WHERE clause is not. The LIKE operator cannot be executed on the compound column just like that. A correct alternative is:

SELECT   RESIDENCE
FROM     PLAYERS
WHERE    RESIDENCE.POSTCODE LIKE '12%'

33.3

CREATE TYPE RESULT AS
     ( WON     NUMBER_OF_SETS,
       LOST    NUMBER_OF_SETS)

CREATE TABLE MATCHES
     ( MATCHNO    INTEGER NOT NULL PRIMARY KEY,
       TEAMNO     INTEGER NOT NULL,
       PLAYERNO   INTEGER NOT NULL,
       RESULT     RESULT NOT NULL)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.65.247