12.4. SQL: Identifiers and Data Types

For a given schema, some database objects such as base tables, views, domains, and constraints are identified by name. A column is identified by appending its (local) name to the name of its table, using a “dot notation”. This distinguishes columns with the same local name in different tables. For example, the second columns of the tables Subject ( subiectCode, title, credit) and Book ( isbn, title) are identified as “Subject.title” and “Book.title”, respectively. In SQL, all names (including local names) are called identifiers, even though local names provide identification only within a limited context.

In SQL-89, identifiers were restricted to at most 18 characters, and all letters had to be in uppercase. From SQL-92 onward, an identifier is either regular or delimited. A regular identifier is a string of at most 128 characters, the first of which must be a letter (“a”..“z”,“A”,..“Z”). Each later character must be a letter, digit (“0”..“9”), or underscore (“_”). Moreover, no reserved word may be used as a regular identifier. You may, however, use a reserved word as part of an identifier. For example, “note” is fine as an identifier even though it includes the reserved word “not”.

Commercial SQLs may require shorter identifiers, or allow some other characters. In Oracle, the length of identifiers is restricted to 30 characters. Microsoft SQL Server allows identifiers of up to 128 characters, but also allows the first character to be “_”, “#”, or “@”, and later characters to be “#”, “@”, or “$”. It uses “#” to start names of temporary objects and “@” to start variable names.

Some examples are shown in Table 12.3. Names in the third row cannot be used as regular identifiers since they start with a digit or include an illegal character (e.g., a space or parenthesis). Names in the fourth to sixth rows were first introduced as reserved words in SQL-92, SQL: 1999, and SQL:2003, respectively.

Table 12.3. Which names may be used as regular identifiers?
Allowed as (regular) identifiers?SQL-89SQL-92SQL: 1999SQL:2003
A, R2D2, CUSTOMER_NRYesYesYesYes
a, This_is_a lonz_identifier, CustomerNrNoYesYesYes
2B, CUSTOMER NR, SPEED(MPH)NoNoNoNo
date, first, levelYesNoNoNo
before, row, triggerYesYesNoNo
multiset, xml, bigintYesYesYesNo
bitYesNoNoYes

Since the list of reserved words grows with each new standard, identifiers in existing applications might become illegal at a later stage. For example, Table 12.3 shows that many words became reserved for the first time in SQL-92, SQL: 1999, and SQL:2003, and some words (such as bit) may be reserved words in some versions of the standard, but withdrawn in later versions. Different SQL dialects may omit some of these words from their reserved word lists, while adding others, which makes portability between dialects even harder. Partly to avoid such problems, the SQL-92 standard introduced delimited identifiers.

A delimited identifier is a string of at most 128 characters, delimited by (i.e., enclosed in) double quotes. Any character at all may be used, as well as reserved words, within the double quotes. For example, the following are legal delimited identifiers: “customer nr”, “speed (km/h)”, “&^%!!”, “date”, “group”. Unlike regular identifiers, delimited identifiers are case sensitive (i.e., uppercase letters are not equated to lowercase letters). For example, the delimited identifiers “invoiceNr”, “InvoiceNr”, and “INVOICENR” are unequal, but the regular identifiers InvoiceNr and INVOICENR are equal.

In SQL, a string constant is delimited by single quotes. For example, ‘USA’ might be a value in a countryCode column. Some commercial SQLs allow string constants to be delimited by double quotes instead of single quotes, e.g., “USA”. This practice should be discouraged, since it conflicts with the now standard use of double quotes for delimited identifiers. In Microsoft SQL Server, the command “set quoted_identifer on” ensures that double quotes are used only for delimited identifiers, forbidding their use to delimit string constants. SQL Server also allows square brackets ([...]) as an alternative to double quotes (“...”).

Words that were reserved may cease to be so in later versions. For example, the names avg, between, exists, and sum were downgraded from reserved words to non reserved keywords in SQL: 1999. Keywords are words that have predefined meanings. Reserved words are keywords that cannot be used as regular identifiers.

As mentioned earlier, a reserved word may be embedded within an identifier. Suppose a column is to store the names of tutorial groups. Since group is a reserved word, we cannot use this (unquoted) for the column name. However, we may add quotes to make it a delimited identifier, or include “group” in a longer regular identifier (e.g., “group”, TuteGroup, GroupName).

A first impression of how a given SQL dialect compares with the standards may be gained by inspecting its list of keywords, and especially its reserved words. Where we deal with SQL in this book, keywords are usually distinguished by displaying them in bold.

Values entered in a table column belong to the data type declared for that column. Table 12.4 lists the standard data types for various versions of the SQL standard. Here, square brackets indicate optional components. All SQL dialects support at least character string and numeric types.

Table 12.4. Standard data types.
StandardData types
SQL-89introduced:fixed-length stringchar, character (n), char (n)
exact numericsmallint
int
numeric (ρ[,s])
decimal (ρ[,s])
approximate numericfloat [p]
real
double precision
SQL-92introduced:variable-length stringvarchar (n), character varying (n)
national character stringsnational character (n),

national character varying (n)

and short forms such as nchar(n)
bit stringbit (n) bit varying (n)
datetimedate{year, month, day}

time {hour, minute, second}

timestamp {date and time}

(time and timestamp can be with or without time zone)
intervalyear-month periods day-time periods
SQL: 1999introduced:large objectclob (n){character large object}

nclob (n) {national character large object}

blob (n){binary large object}

and various lengthier alternatives
boolean typeboolean
row typerow (field-definition [,...]
collection typedata-type array(n)
user defined type (UDT) 
reference typeref (UDT)
SQL: 2003introducedexact numericbigint
collection typedata-type multiset
xml dataxml
withdrawn bit (n) bit varying (n)

In the standard, “char(n)” means the value is stored as a fixed length string of n characters. If the value has fewer than n characters, blanks are appended to fill out the length. If no size (rc) is specified, this is treated as a string with only one character. A value of type varchar(rc) is stored as a string of at most n characters. If the value is shorter, it is stored as it is, without padding it with extra blanks. A varchar(rc) string also has an “end-of-string” marker, the size of which varies between different implementations. This has to be taken into account when estimating the storage space required. For short strings with little variation in length char(n) may be the preferred choice. SQL-92 allows various national character sets to be declared. The use of “nchar” or equivalent syntactic variations indicates that characters are selected from the designated national character set—this implies that the character encoding uses Unicode, which typically requires two bytes of storage per character instead of the one byte per character that was commonplace for encoding the standard English characters and symbols (often referred to as the ASCII code, from the name of the standards body).

With the numeric data type, the precision p is the maximum number of digits included in the number, and the scale s is the number of digits after the decimal point. For example, columns declared numeric(6, 2) allow values in the range -9999.99 to.+9999.99. The decimal type is like the numeric type except that an implementation may sometimes provide a precision greater than p. Many systems implement numeric and decimal as the same type. The integer (including smallint and bigint) data types allow integers only (no fractions). The three approximate numeric data types allow very large or very small numbers to be stored to a specified precision as a mantissa times an exponent of 10. The abbreviations char, dec, int, and nchar may be expanded to character, decimal, integer, and national character.

As set out in Table 12.4, SQL-92 provides direct support for bit strings as well as time points and time intervals. Times are local unless the with time zone option is specified. This option includes the offset from UTC (Universal Time, Coordinated, formerly called Greenwich Mean Time (GMT)). Various temporal operators are provided (e.g., to allow computation of intervals by subtracting one time point from another).

In SQL: 1999, all of the previous data types are called predefined types. As Table 12.4 indicates, SQL: 1999 adds four more predefined types: character large object, national character large object, binary large object, and boolean (true or false). In addition, SQL: 1999 allows row types, user-defined types, reference types, and collection types. A row type is basically a sequence of fields and can be used as the basis for defining a table. A user-defined type (UDT) is identified by its name and may include a list of method specifications. If based on a single, predefined type, it is called a distinct type. If specified as a list of attribute definitions, it is called a structured type. The only collection type constructor allowed in SQL: 1999 was array. The SQL:2003 standard added support for multiset (bags) as a collection type, large integers (bigint), and, significantly, xml as a recognized data type within SQL (we’ll return to the xml type in the next chapter).

Currently, most commercial SQLs support all the SQL-89 data types, almost all the additional SQL-92 data types, a few of the extra SQL: 1999 and SQL:2003 data types, and some additional non-standard data types (e.g., money). Many differences exist. For example, Oracle treats varchar as char, and uses varchar2 to denote the standard varchar. Informix implemented sets, multisets and lists years ago. SQL Server includes many extra types such as tinyint, money, smallmoney, and uniqueidentifier. The range of most standard types is usually implementation defined rather than standardized. For example, the range for smallint is often, but not always, -32,768..32,767.

Exercise 12.4

1.Which of the following are legal identifiers in which SQL standard, if shifted to uppercase?
  1. Payroll#

  2. PayrollNr

  3. “Payroll#”

  4. 1994Tax

  5. Tax in 1994

  6. “Tax in 1994”

  7. Tax_in_1994

  8. Deposit_in_$

  9. Mass_(kg)

  10. Order

  11. WorldwideWebldentifier

  12. count

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.16.184