Chapter 2. Foundational Concepts

SQL provides an easy, intuitive way to interact with a database. While the SQL standard does not define the concept of a “database,” it does define all the functions and concepts needed for a user to create, retrieve, update, and delete data. It is important to know the types of syntax in the ANSI/ISO SQL standard and the particular platform-specific syntax guidelines. This chapter will provide you with a grounding in those areas. For brevity, we will refer to the ANSI/ISO standard as, simply, SQL or “the SQL standard” in the remainder of this chapter.

Database Platforms Described in This Book

SQL in a Nutshell, Fourth Edition describes the SQL standard and the platform-specific implementations of several leading RDBMSs:

MySQL / MariaDB
MySQL is a popular open source DBMS that is known for its ease of use and good performance. It runs on numerous operating systems, including most Linux variants. To improve performance, it has a slimmer feature set than many other DBMSs. Since the purchase of Sun Microsystems by Oracle, the MySQL user-base has been split into 2-factions - MySQL (maintained by Oracle) and MariaDb (by MariaDB Foundation whose head, Monty Widenius, was the original creator of MySQL). This book covers MySQL 8, now owned by Oracle, and the most popular MySQL fork MariaDB 10.5. Both have more or less equivalent and compatible offering of functionality and MariaDb does pull in changes from MySQL core. Where they deviate most is in the storage engines they offer and their release cycle. At some point in the near future, MySQL and MariaDB will be divergent enough to warrant distinct entries throughout this book. For now, they are maintained as one.
Oracle
Oracle is a leading RDBMS in the commercial sector. Oracle was the first commercially available SQL database platform, released in the summer of 1979, running on Vax computers as Oracle v2. Since that time, Oracle has grown to run on a multitude of operating systems and hardware platforms. Its scalable, reliable architecture has made it the platform of choice for many users. In this edition, we cover Oracle Database 19c.
PostgreSQL
PostgreSQL is the most feature-rich open source database platform available. For the last several years, PostgreSQL has seen a steep rise in popularity with a strongly upward trend. PostgreSQL is best known for its excellent support for ANSI/ISO standards and robust transaction processing capabilities, as well as its rich data type and database object support. In addition to its full set of features, PostgreSQL runs on a wide variety of operating systems and hardware platforms. This book covers PostgreSQL 13.
SQL Server
Microsoft SQL Server is a popular RDBMS that runs on the Windows and Linux operating systems. Its features include ease of use, an all-inclusive feature set covering OLTP and analytic workloads, low cost, and high performance. This book covers Microsoft SQL Server 2019.

Categories of Syntax

To begin to use SQL, readers should understand how statements are written. SQL syntax falls into four main categories. Each category is introduced in the following list and then explained in further detail in the sections that follow:

Identifiers
Describe a user- or system-supplied name for a database object, such as a database, a table, a constraint on a table, a column in a table, a view, etc.
Literals
Describe a user- or system-supplied string or value that is not otherwise an identifier or a keyword. Literals may be strings like “hello”, numbers like 1234, dates like “Jan 01, 2002”, or Boolean values like TRUE.
Operators
Are symbols specifying an action to be performed on one or more expressions, most often in DELETE, INSERT, SELECT, or UPDATE statements. Operators are also used frequently in the creation of database objects.
Reserved words and keywords
Have special meaning to the database SQL parser. Keywords such as SELECT, GRANT, DELETE, or CREATE are words that cannot be used as identifiers within the database platform. These are usually commands or SQL statements. Reserved words are words that may become reserved some time in the future. Elsewhere in the book, we use the term keyword to describe both concepts. You can circumvent the restriction on using reserved words and keywords as identifiers by using quoted identifiers, which will be described in a moment. However, this is not recommended since a single typo could play havok with your code.

Identifiers

In its simplest terms, an identifier is the name of an object you create on your database platform. However, you can create identifiers at a variety of levels within a database. So lets start at the top. In ANSI terms, clusters contain sets of catalogs, catalogs contain sets of schemas, schemas contain sets of objects, and so on. Most database platforms use corollary terms: instances contain one or more databases; databases contain one or more schemas; and schemas contain one or more tables, views, or stored procedures, and the privileges associated with each object. At each level of this structure, items require unique names (that is, identifiers) so that they can be referenced by programs and system processes. This means that each object (whether a database, table, view, column, index, key, trigger, stored procedure, or constraint) in an RDBMS must be identified. When issuing the command that creates a database object, you must specify an identifier (i.e., a name) for that new object.

There are two important categories of rules that experienced developers keep in mind when choosing an identifier for a given item:

Naming conventions
Are logical rules of thumb that govern how database designers name objects. Consistently following these rules ultimately creates better database structures and enables improved data tracking. These are not so much SQL requirements as the distilled experience of practiced programmers.
Identifier rules
Are naming rules set by the SQL standard and implemented by the platforms. These rules govern characteristics such as how long a name may be. These identifier conventions are covered for each vendor later in this chapter.

Naming conventions

Naming conventions establish a standard baseline for choosing object identifiers. In this section, we present a list of naming conventions (rules for picking your identifiers) that are based on long years of experience. The SQL standard has no comment on naming conventions outside of the uniqueness of an identifier, its length, and the characters that are valid within the identifier. However, here are some conventions that you should follow:

Select a name that is meaningful, relevant, and descriptive
Avoid names that are encoded, such as a table named XP21, and instead use human-readable names like Expenses_2021, so that others can immediately know that the table stores expenses for the year 2021. Remember that those developers and DBAs maintaining the database objects you create incur a burden on those maintaining, perhaps long after you have gone, and the names you use should make sense at a glance. Each database vendor has limits on object name size, but names generally can be long enough to make sense to anyone reading them.
Choose and apply the same case throughout
Use either all uppercase or all lowercase for all objects throughout the database. Some database servers are case-sensitive, so using mixed-case identifiers might cause problems later. Many ORM products, such as Entity Framework, default to camelcase notation. This may cause problems later down the road, if you need to port your application to database platforms which are case sensitive.For Oracle you should use all uppercase and for PostgreSQL use all lower case.
Use abbreviations consistently
Once you’ve chosen an abbreviation, use it consistently throughout the database. For example, if you use EMP as an abbreviation for EMPLOYEE, you should use EMP throughout the database; do not use EMP in some places and EMPLOYEE in others.
Use complete, descriptive, meaningful names with underscores for reading clarity
A column name like UPPERCASEWITHUNDERSCORES is not as easy to read as UPPERCASE_WITH_UNDERSCORES.
Do not put company or product names in database object names
Companies get acquired, and products change names. These elements are too transitory to be included in database object names.
Do not use overly obvious prefixes or suffixes
For example, don’t use DB_ as a prefix for a database, and don’t prefix every view with V_. Simple queries to the system table of the database can tell the DBA or database programmer what type of object an identifier represents.
Do not fill up all available space for the object name
If the database platform allows a 32-character table name, try to leave at least a few free characters at the end. Some database platforms append prefixes or suffixes to table names when manipulating temporary copies of the tables.
Do not use quoted identifiers
Quoted identifiers are object names stored within double quotation marks. (The ANSI standard calls these delimited identifiers.) Quoted identifiers are also case-sensitive. Encapsulating an identifier within double quotes allows creation of names that may be difficult to use and may cause problems later. For example, users could embed spaces, special characters, mixed-case characters, or even escape sequences within a quoted identifier, but some third-party tools (and even vendor-supplied tools) cannot handle special characters in names. Therefore, quoted identifiers should not be used.
Note

Some platforms allow delimiting symbols other than double quotes. For example, SQL Server uses brackets ([ ]) to designate quoted identifiers.

There are several benefits to following a consistent set of naming conventions. First, your SQL code becomes, in a sense, self-documenting, because the chosen names are meaningful and understandable to other users. Second, your SQL code and database objects are easier to maintain—especially for other users who come later—because the objects are consistently named. Finally, maintaining consistency increases database functionality. If the database ever has to be transferred or migrated to another RDBMS, consistent and descriptive naming saves both time and energy. Giving a few minutes of thought to naming SQL objects in the beginning can prevent problems later.

Identifier rules

Identifier rules are rules for identifying objects within the database that are rigidly enforced by the database platforms. These rules apply to normal identifiers, not quoted identifiers. Rules specified by the SQL standard generally differ somewhat from those of specific database vendors. Table 2-1 contrasts the SQL rules with those of the RDBMS platforms covered in this book.

Table 2-1. Table 2-1. Platform-specific rules for regular object identifiers (excludes quoted identifiers)
Characteristic Platform Specification
Identifier size
SQL 128 characters.
MySQL 64 characters; aliases may be 255 characters.
Oracle 30 bytes (number of characters depends on the character set); database names are limited to 8 bytes; database links are limited to 128 bytes.
PostgreSQL 63 characters (NAMEDATALEN property minus 1).
SQL Server 128 characters; temp tables are limited to 116 characters.
Identifier may contain
SQL Any number or character, and the underscore (_) symbol.
MySQL Any number, character, or symbol. Cannot be composed entirely of numbers.
Oracle Any number or character, and the underscore (_), pound sign (#), and dollar sign ($) symbols (though the last two are discouraged). Database links may also contain a period (.).
PostgreSQL Any number or character or _. Unquoted upper case characters are equivalent to lower case.
SQL Server Any number or character, and the underscore (_), at sign (@), pound sign (#), and dollar sign ($) symbols.
Identifier must begin with
SQL A letter.
MySQL A letter or number. Cannot be composed entirely of numbers.
Oracle A letter.
PostgreSQL A letter or underscore (_).
SQL Server A letter, underscore (_), at sign (@), or pound sign (#).
Identifier cannot contain
SQL Spaces or special characters.
MySQL Period (.), slash (/), or any ASCII(0) or ASCII(255) character. Single quotes () and double quotes (“ “) are allowed only in quoted identifiers. Identifiers should not end with space characters.
Oracle Spaces, double quotes (“ “), or special characters.
PostgreSQL Double quotes (“ “).
SQL Server Spaces or special characters.
Allows quoted identifiers
SQL Yes.
MySQL Yes.
Oracle Yes.
PostgreSQL Yes.
SQL Server Yes.
Quoted identifier symbol
SQL Double quotes (“ “).
MySQL Single quotes () or double quotes (“ “) in ANSI compatibility mode.
Oracle Double-quotes (“ “).
PostgreSQL Double-quotes (“ “).
SQL Server Double quotes (“ “) or brackets ([ ]); brackets are preferred.
Identifier may be reserved
SQL No, unless as a quoted identifier.
MySQL No, unless as a quoted identifier.
Oracle No, unless as a quoted identifier.
PostgreSQL No, unless as a quoted identifier.
SQL Server No, unless as a quoted identifier.
Schema addressing
SQL Catalog.schema.object
MySQL Database.object.
Oracle Schema.object.
PostgreSQL Database.schema.object.
SQL Server Server.database.schema.object.
Identifier must be unique
SQL Yes.
MySQL Yes.
Oracle Yes.
PostgreSQL Yes.
SQL Server Yes.
Case Sensitivity
SQL No.
MySQL Only if underlying filesystem is case sensitive (e.g., Mac OS or Unix). Triggers, logfile groups, and tablespaces are always case sensitive.
Oracle No by default, but can be changed.
PostgreSQL No.
SQL Server No by default, but can be changed.
Other rules
SQL None.
MySQL May not contain numbers only.
Oracle Database links are limited to 128 bytes and may not be quoted identifiers.
PostgreSQL None.
SQL Server Microsoft commonly uses brackets rather than double quotes for quoted identifiers.

Identifiers must be unique within their scope. Thus, given our earlier discussion of the hierarchy of database objects, database names must be unique on a particular instance of a database server, while the names of tables, views, functions, triggers, and stored procedures must be unique within a particular schema. On the other hand, a table and a stored procedure can have the same name, since they are different types of object. The names of columns, keys, and indexes must be unique on a single table or view, and so forth. Check your database platform’s documentation for more information—some platforms require unique identifiers where others may not. For example, Oracle requires that all index identifiers be unique throughout the database, while others (such as SQL Server) require that the index identifier be unique only for the table on which it depends.

Remember, quoted identifiers (object names encapsulated within a special delimiter, usually double quotes or brackets [LikeThis]) may be used to break some of the identifier rules specified earlier. One example is that quoted identifiers are case sensitive—that is, “foo does not equal “FOO” or “Foo”. Furthermore, quoted identifiers may be used to bestow a reserved word as a name, or to allow normally unusable characters and symbols within a name. For instance, you normally can’t use the percent sign (%) in a table name. However, you can, if you must, use that symbol in a table name so long as you always enclose that table name within double quotes. That is, to name a table expense%%ratios, you would specify the name in quotes: “expense%%ratios” or [expense%%ratios]. Again, remember that in SQL such names are sometimes known as “delimited identifiers.”

Note

Once you have created an object name as a quoted identifier, we recommend that users always reference it using its special delimiter. Inconsistency often leads to problematic or poorly performing code.

Literals

SQL defines a literal value as any explicit numeric value, character string, temporal value (e.g., date or time), or Boolean value that is not an identifier or a keyword. SQL databases allow a variety of literal values in a SQL program. Literal values are allowed for most of the numeric, character, Boolean, and date data types. For example, SQL Server numeric data types include (among others) INTEGER, REAL, and MONEY. Thus, numeric literals can look like:

        30
        −117
        +883.3338
        −6.66
        $70000
        2E5
        7E-3
        

As these examples illustrate, SQL Server allows signed and unsigned numerals, in scientific or normal notation. And since SQL Server has a money data type, even a dollar sign can be included. SQL Server does not allow other symbols in numeric literals (besides 0 1 2 3 4 5 6 7 8 9 + - $ . E e), however, so exclude commas or periods, in European countries where a comma is used in place of a period in decimal or monetary values. Most databases interpret a comma in a numeric literal as a list item separator. Thus, the literal value 3,000 would likely be interpreted as two values: 3 and, separately, 000.

       Boolean, character string, and date literals look like:
       TRUE
       'Hello world!'
       'OCT-28-1966 22:14:30:00'
       

Character string literals should always be enclosed in single quotation marks (''). This is the standard delimiter for all character string literals. Character string literals are not restricted just to the letters of the alphabet. In fact, any character in the character set can be represented as a string literal. All of the following are string literals:

        '1998'
        '70,000 + 14000'
        'There once was a man from Nantucket,'
        'Oct 28, 1966'
        

and are compatible with the CHARACTER data type. Remember not to confuse the string literal ‘1998’ with the numeric literal 1998. Once string literals are associated with a character data type, it is poor practice to use them in arithmetic operations without explicitly converting them to a numeric data type. Some database products will perform automatic conversion of string literals containing numbers when comparing them against any DATE or NUMBER data type values, but not all. On some database platforms, performance declines when you do not explicitly convert such data types.

By doubling the delimiter, you can effectively represent a single quotation mark in a literal string, if necessary. That is, you can use two quotation marks each time a single quotation mark is part of the value. This example, taken from SQL Server, illustrates the idea:

        SELECT 'So he said ''Who''s Le Petomaine?'''

This statement gives the following result:

        -----------------
        So he said 'Who's Le Petomaine?'
        

Operators

An operator is a symbol specifying an action to be performed on one or more expressions. Operators are used most often in DELETE, INSERT, SELECT, and UPDATE statements, but they are also used frequently in the creation of database objects such as stored procedures, functions, triggers, and views.

Operators typically fall into these categories:

Arithmetic operators
Supported by all databases
Assignment operators
Supported by all databases
Bitwise operators
Supported by MySQL and SQL Server
Comparison operators
Supported by all databases
Logical operators
Supported by all databases
Unary operators
Supported by MySQL, Oracle, and SQL Server

Arithmetic operators

Arithmetic operators perform mathematical operations on two expressions of any data type in the numeric data type category. See Table 2-2 for a listing of the arithmetic operators.

Table 2-2. Arithmetic operators

Arithmetic operator Meaning
+ Addition
Subtraction
* Multiplication
/ Division
% Modula (SQL Server only); returns the remainder of a division operation as an integer value
Note

In MySQL, Oracle, and SQL Server, the + and operators can be used to perform arithmetic operations on date values.

The various platforms also offer their own unique methods for performing arithmetic operations on date values.

Assignment operators

Except in Oracle, which uses :=, the assignment operator (=) assigns a value to a variable or the alias of a column heading. In all of the database platforms covered in this text, the keyword AS may serve as an operator for assigning table- or column-heading aliases.

Bitwise operators

Both Microsoft SQL Server and MySQL provide bitwise operators (see Table 2-3) as a shortcut to perform bit manipulations between two-integer expressions. Valid data types that are accessible to bitwise operators include BINARY, BIT, INT, SMALLINT, TINYINT, and VARBINARY. PostgreSQL supports the BIT and BIT VARYING data types; it also supports the bitwise operators AND, OR, XOR, concatenation, NOT, and shifts left and right.

Table 2-3. Bitwise operators

Bitwise operator Meaning
& Bitwise AND (two operands)
| Bitwise OR (two operands)
^ Bitwise exclusive OR (two operands)

Comparison operators

Comparison operators test whether two expressions are equal or unequal. The result of a comparison operation is a Boolean value: TRUE, FALSE, or UNKNOWN. Also, note that the ANSI standard behavior for a comparison operation where one or more of the expressions is NULL is to return NULL. For example, the expression 23 + NULL returns NULL, as does the expression Feb 23, 2022 + NULL. See Table 2-4 for a list of the comparison operators.

Table 2-4. Comparison operators

Comparison operator Meaning
= Equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
<> Not equal to
!= Not equal to (not ANSI standard)
!< Not less than (not ANSI standard)
!> Not greater than (not ANSI standard)

Boolean comparison operators are used most frequently in a WHERE clause to filter the rows that qualify for the search conditions. The following example uses the greater than or equal to comparison operation:

          SELECT *
           FROM Products
           WHERE ProductID >= 347
          

Logical operators

Logical operators are commonly used in a WHERE clause to test for the truth of some condition. They return a Boolean value of either TRUE or FALSE. Table 2-5 shows a list of logical operators. Note that not all database systems support all operators.

Table 2-5. Logical operators

Logical operator Meaning
ALL TRUE if all of a set of comparisons are TRUE
AND TRUE if both Boolean expressions are TRUE
ANY TRUE if any one of a set of comparisons is TRUE
BETWEEN TRUE if the operand is within a range
EXISTS TRUE if a subquery contains any rows
IN TRUE if the operand is equal to one of a list of expressions or one or more rows returned by a subquery
LIKE TRUE if the operand matches a pattern
NOT Reverses the value of any other Boolean operator
OR TRUE if either Boolean expression is TRUE
SOME TRUE if some of a set of comparisons are TRUE

Unary operators

Unary operators perform an operation on only one expression of any of the data types in the numeric data type category. Unary operators may be used on any numeric data type, though the bitwise operator (~) may be used only on integer data types (see Table 2-6).

Table 2-6. Unary operators

Unary operator Meaning
+ Numeric value is positive
Numeric value is negative
~ A bitwise NOT; returns the complement of the number (not in Oracle)

Operator precedence

Sometimes operator expressions become rather complex, with multiple levels of nesting. When an expression has multiple operators, operator precedence determines the sequence in which the operations are performed. The order of execution can significantly affect the resulting value.

Operators have different precedence levels. An operator on a higher level is evaluated before an operator on a lower level. The following listing shows the operators’ precedence levels, from highest to lowest:

  • () (parenthetical expressions)
  • +, , ~ (unary operators)
  • *, /, % (mathematical operators)
  • +, (arithmetic operators)
  • =, >, <, >=, <=, <>, !=, !>, !< (comparison operators)
  • ^ (bitwise exclusive OR), & (bitwise AND), | (bitwise OR)
  • NOT
  • AND
  • ALL, ANY, BETWEEN, IN, LIKE, OR, SOME
  • = (variable assignment)

Operators are evaluated from left to right when they are of equal precedence. However, parentheses are used to override the default precedence of the operators in an expression. Expressions within a set of parentheses are evaluated first, while operations outside the parentheses are evaluated next.

For example, the following expressions in an Oracle query return very different results:

          SELECT 2 * 4 + 5 FROM dual
          -- Evaluates to 8 + 5, which yields an expression result of 13.
          SELECT 2 * (4 + 5) FROM dual
          -- Evaluates to 2 * 9, which yields an expression result of 18.

In expressions with nested parentheses, the most deeply nested expression is evaluated first.

This next example contains nested parentheses, with the expression 5 - 3 appearing in the most deeply nested set of parentheses. This expression 5 - 3 yields a value of 2. Then the addition operator (+) adds this result to 4, which yields a value of 6. Finally, the 6 is multiplied by 2 to yield an expression result of 12:

          SELECT 2 * (4 + (5 - 3) ) FROM dual
          -- Evaluates to 2 * (4 + 2), which further evaluates to 2 * 6,
          -- and yields an expression result of 12.
          RETURN
Note

We recommend using parentheses to clarify precedence in all complex queries.

System delimiters and operators

String delimiters mark the boundaries of a string of alphanumeric characters. Somewhat similar to the way in which keywords and reserved words have special significance to you database server, system delimiters are those symbols within the character set that have special significance to your database server. Delimiters are symbols that are used to judge the order or hierarchy of processes and list items. Operators are those delimiters used to judge values in comparison operations, including symbols commonly used for arithmetic or mathematical operations. Table 2-7 lists the system delimiters and operators allowed by SQL.

Table 2-2. SQL delimiters and operators
Symbol Usage Example
+ Addition operator; in SQL Server, also serves as a concatenation operator On all database platforms:
--- SELECT MAX(emp_id) + 1 FROM employee ---
- Subtraction operator; also serves as a range indicator in CHECK constraints As a subtraction operator:
--- SELECT MIN(emp_id) - 1 FROM employee ---
As a range operator, in a CHECK constraint:
--- ALTER TABLE authors ADD CONSTRAINT authors_zip_num CHECK (zip LIKE %[0-9]%) ---
* Multiplication operator --- SELECT salary * 0.05 AS bonus FROM employee; --
/ Division operator --- SELECT salary / 12 AS monthly FROM employee; ---
= Equality operator --- SELECT * FROM employee WHERE lname = Fudd ---
<, > Inequality operators (!= is a nonstandard equivalent on several platforms) On all platforms:
--- SELECT * FROM employee WHERE lname <> Fudd ---
< Less than operator --- SELECT lname, emp_id, (salary * 0.05) AS bonus FROM employee WHERE (salary * 0.05) ⇐ 10000 AND exempt_status < 3 ---
<= Less than or equal to operator
> Greater than operator --- SELECT lname, emp_id, (salary * 0.025) AS bonus FROM employee WHERE (salary * 0.025) > 10000 AND exempt_status >= 4 ---
>= Greater than or equal to operator
() Used in expressions and function calls, to specify order of operations, and as a subquery delimiter Expression:
--- SELECT (salary / 12) AS monthly FROM employee WHERE exempt_status >= 4 --
Function call:
--- SELECT SUM(travel_expenses) FROM “expense%%ratios” --
Order of operations:
--- SELECT (salary / 12) AS monthly, ((salary / 12) / 2) AS biweekly FROM employee WHERE exempt_status >= 4 --
Subquery:
--- SELECT * FROM stores WHERE stor_id IN (SELECT stor_id FROM sales WHERE ord_date > 01-JAN-2004) ---
% Wildcard attribute indicator --- SELECT * FROM employee WHERE lname LIKE Fud% ---
, List item separator --- SELECT lname, fname, ssn, hire_date FROM employee WHERE lname = Fudd ---
. Identifier qualifier separator --- SELECT * FROM scott.employee WHERE lname LIKE Fud% ---
Character string indicators --- SELECT * FROM employee WHERE lname LIKE FUD% OR fname = ELMER ---
Quoted identifier indicators --- SELECT expense_date, SUM(travel_expense) FROM “expense%%ratios” WHERE expense_date BETWEEN 01-JAN-2004 AND 01-APR-2004 ---
Single-line comment delimiter (two dashes followed by a space) ---— Finds all employees like Fudd, Fudge, and Fudston SELECT * FROM employee WHERE lname LIKE Fud% ---
/* Beginning multiline comment delimiter --- /* Finds all employees like Fudd, Fudge, and Fudston */ SELECT * FROM employee WHERE lname LIKE Fud% ---
*/ Ending multiline comment indicator

Keywords and Reserved Words

Just as certain symbols have special meaning and functionality within SQL, certain words and phrases have special significance. SQL keywords are words whose meanings are so closely tied to the operation of the RDBMS that they should not be used for any other purpose; generally, they are words used in SQL statements. Reserved words, on the other hand, do not have special significance now, but they probably will in a future release. Note that these words can be used as identifiers on most platforms, but they shouldn’t be. For example, the word “SELECT” is a keyword and should not be used as a table name. To emphasize the fact that keywords should not be used as identifiers but nevertheless could be, the SQL standard calls them “non-reserved keywords.”

Note

It is generally a good idea to avoid naming columns or tables after a keyword that occurs in any major platform, because database applications are frequently migrated from one platform to another.

Reserved words and keywords are not always words used in SQL statements; they may also be words commonly associated with database technology. For example, CASCADE is used to describe data manipulations that allow their actions, such as a delete or update operation, to “flow down,” or cascade, to any subordinate tables. Reserved words and keywords are widely published so that developers will not use them as identifiers that will, either now or at some later revision, cause a problem.

SQL specifies its own list of reserved words and keywords, as do the database platforms, because they each have their own extensions to the SQL command set. The SQL standard keywords, as well as the keywords in the different vendor implementations, are listed in the Appendix.

ANSI/ISO SQL and Platform-specific Data Types

A table can contain one or many columns. Each column must be defined with a data type that provides a general classification of the data that the column will store. In real-world applications, data types improve efficiency and provide some control over how tables are defined and how the data is stored within a table. Using specific data types enables better, more understandable queries and helps control the integrity of the data.

The tricky thing about ANSI/ISO SQL data types is that they do not always map directly to identical implementations in different platforms. Although the various platforms specify “data types” that correspond to the ANSI/ISO SQL data types, these are not always true ANSI/ISO SQL data types: for example, MySQL’s implementation of a BIT data type is actually identical to a CHAR(1) data type value. Nonetheless, each of the platform-specific data types is close enough to the standard to be both easily understandable and job-ready.

The official ANSI/ISO SQL data types (as opposed to platform-specific data types) fall into the general categories described in Table 2-8. Note that the ANSI/ISO SQL standard contains a few rarely used data types (ARRAY, MULTISET, REF, and ROW) that are shown only in Table 2-8 and not discussed elsewhere in the book.

Table 2-3. ANSI/ISO SQL categories and data types
Category Example data types and abbreviations Description
BINARY BINARY LARGE OBJECT (BLOB) This data type stores binary string values in hexadecimal format. Binary string values are stored without reference to any character set and without any length limit.
BOOLEAN BOOLEAN This data type stores truth values (either TRUE or FALSE).
CHARACTER string types CHAR
CHARACTER VARYING (VARCHAR)
These data types can store any combination of characters from the applicable character set. The varying data types allow variable lengths, while the other data types allow only fixed lengths. Also, the variable-length data types automatically trim trailing spaces, while the other data types pad all open space.
NATIONAL CHARACTER (NCHAR)
NATIONAL CHARACTER VARYING (NCHAR VARYING)
The national character data types are designed to support a particular implementation-defined character set.
CHARACTER LARGE OBJECT (CLOB) CHARACTER LARGE OBJECT and BINARY LARGE OBJECT are collectively referred to as large object string types.
NATIONAL CHARACTER LARGE OBJECT (NCLOB) Same as CHARACTER LARGE OBJECT, but supports a particular implementation-defined character set.
DATALINK DATALINK Defines a reference to a file or other external data source that is not part of the SQL environment.
INTERVAL INTERVAL Specifies a set of time values or span of time.
COLLECTION ARRAY MULTISET ARRAY was offered in SQL:1999, and MULTISET was added in SQL:2003. Whereas an ARRAY is a set-length, ordered collection of elements, MULTISET is a variable-length, unordered collection of elements. The elements in an ARRAY and a MULTISET must be of a predefined data type.
JSON JSON Added in SQL:2016 Part 13, this data type stores JSON data and can be used wherever a SQL data type is allowed. It’s implementation is similar to the XML extension in most ways.
NUMERIC INTEGER (INT) SMALLINT BIGINT NUMERIC(p,s) DEC[IMAL](p,s) FLOAT(p,s) REAL DOUBLE PRECISION These data types store exact numeric values (integers or decimals) or approximate (floating-point) values. INT, BIGINT, and SMALLINT store exact numeric values with a predefined precision and a scale of zero. NUMERIC and DEC store exact numeric values with a definable precision and a definable scale. FLOAT stores approximate numeric values with a definable precision, while REAL and DOUBLE PRECISION have predefined precisions. You may define a precision (p) and scale (s) for a DECIMAL, FLOAT, or NUMERIC data type to indicate the total number of allowed digits and the number of decimal places, respectively.
INT, SMALLINT, and DEC are sometimes referred to as exact numeric types, while FLOAT, REAL, and DOUBLE PRECISION are sometimes called approximate numeric types.
TEMPORAL DATE, TIME
TIME WITH TIME ZONE
TIMESTAMP
TIMESTAMP WITH TIME ZONE
These data types handle values related to time. DATE and TIME are self-explanatory. data types with the WITH TIME ZONE suffix also include a time zone offset. The TIMESTAMP data types are used to store a value that represents a precise moment in time. Temporal types are also known as datetime types.
XML XML Introduced in SQL:2011 Part 14, this data type stores XML data and can be used wherever a SQL data type is allowed (e.g., for a column of a table, a field in a row, etc.). Operations on the values of an XML type assume a tree-based internal data structure. The internal data structure is based on the XML Information Set Recommendation (Infoset), using a new document information item called the XML root information item.

Not every database platform supports every ANSI SQL data type. Table 2-9 compares data types across the five platforms. The table is organized by data type name.

Be careful to look for footnotes when reading this table, because some platforms support a data type of a given name but implement it in a different way than the ANSI/ISO standard and/or other vendors.

Note

While the different platforms may support similarly named data types, the details of their implementations may vary. The sections that follow this table list the specific requirements of each platform’s data types.

Table 2-4. Comparison of platform-specific data types
Vendor data type MySQL Oracle PostgreSQL SQL Server SQL data type
a b c d e f
g h BFILE Y
None BIGINT Y Y
Y BIGINT BINARY Y
Y BLOB BINARY_FLOAT Y
FLOAT BINARY_DOUBLE Y
DOUBLE PRECISION BIT Y Y
Y None BIT VARYING, VARBIT Y
None BLOB Y Y
BLOB BOOL, BOOLEAN Y Y
BOOLEAN BOX Y
None BYTEA Y
BLOB CHAR, CHARACTER Y Y Y
Y CHARACTER CHAR FOR BIT DATA
None CIDR Y
None CIRCLE Y
None CLOB Y
CLOB CURSOR
Y None DATALINK
DATALINK DATE Y Y Y
Y DATE DATETIME Y
Y TIMESTAMP DATETIMEOFFSET
Y TIMESTAMP DATETIME2
Y TIMESTAMP WITH TIME ZONE DBCLOB
NCLOB DEC, DECIMAL Y Y Y
Y DECIMAL DOUBLE, DOUBLE PRECISION Y Y Y
Y FLOAT ENUM Y Y
None FLOAT Y Y Y
Y DOUBLE PRECISION FLOAT4 Y
FLOAT(p) FLOAT8 Y
FLOAT(p) GRAPHIC
BLOB GEOGRAPHY
Y None GEOMETRY
Y None HIERARCHYID
Y None IMAGE
Y None INET Y
None INT, INTEGER Y Y Y
Y INTEGER INT2 Y
SMALLINT INT4 Y
INT, INTEGER INTERVAL Y
INTERVAL INTERVAL DAY TO SECOND Y Y
INTERVAL DAY TO SECOND INTERVAL YEAR TO MONTH Y Y
INTERVAL YEAR TO MONTH LINE Y
None LONG Y
None LONG VARCHAR
None LONGBLOB Y
BLOB LONG RAW Y
BLOB LONG VARGRAPHIC
None LONGTEXT Y
None LSEG Y
None MACADDR Y
None MEDIUMBLOB Y
None MEDIUMINT Y
INT MEDIUMTEXT Y
None MONEY Y
Y None NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NCHAR VARYING, NVARCHAR Y Y Y
Y NATIONAL CHARACTER VARYING NCHAR, NATIONAL CHAR, NATIONAL CHARACTER Y Y Y
Y NATIONAL CHARACTER NCLOB Y
NCLOB NTEXT, NATIONAL TEXT
Y NCLOB NVARCHAR2(n) Y
None NUMBER Y Y Y
Y None NUMERIC
Y NUMERIC OID Y
None PATH Y
None POINT Y
None POLYGON Y
None RAW Y
None REAL Y Y Y
Y REAL ROWID Y
None ROWVERSION
Y None SERIAL, SERIAL4 Y Y
None SERIAL8, BIGSERIAL Y
None SET Y
None SMALLDATETIME
Y None SMALLINT Y Y Y
Y SMALLINT SMALLMONEY
Y None SQL_VARIANT
Y None TABLE
Y None TEXT Y Y
Y None TIME Y Y
Y TIME TIMESPAN
INTERVAL TIMESTAMP Y Y Y
Y TIMESTAMP TIMESTAMP WITH TIME ZONE, TIMESTAMPTZ Y
TIMESTAMP WITH TIME ZONE TIMETZ Y
TIME WITH TIME ZONE TINYBLOB Y
Y None TINYINT Y
Y None TINYTEXT Y
None UNIQUEIDENTIFIER
Y None UROWID Y
None VARBINARY Y
Y BLOB VARCHAR, CHAR VARYING, CHARACTER VARYING Y Y Y
Y CHARACTER VARYING(n) VARCHAR2 Y
CHARACTER VARYING VARCHAR FOR BIT DATA
BIT VARYING VARGRAPHIC
NCHAR VARYING YEAR Y
TINYINT XML Y
Y XML XMLTYPE Y
a Synonym for FLOAT.
b Synonym for REAL.
c Synonym for DOUBLE PRECISION.
d Synonym for DECIMAL(9,2).
e Synonym for DECIMAL.
f Synonym for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE.
g Implemented as a non-date data type.
h Oracle vastly prefers VARCHAR2.

a a

b b

c c

d d

e e

f f

g g

h h

The following sections list platform-specific data types, their ANSI/ISO SQL data type categories (if any), and pertinent details. Descriptions are provided for non-SQL data types.

MySQL Data Types

Both MySQL version 8 and MariaDb 10.5 have support for spatial data. Spatial data is handled in a variety of classes provided in the OpenGIS Geometry Model, which is supported by the MyISAM, InnoDB, Aria, anNDB, and ARCHIVE database engines. Only MyISAM, InnoDB, and Aria storage engines support both spatial and non-spatial indexes; the other database engines only support non-spatial indexes.

MySQL numeric data types support the following optional attributes:

UNSIGNED
The numeric value is assumed to be non-negative (positive or zero). For fixed-point data types such as DECIMAL and NUMERIC, the space normally used to show a positive or negative condition of the numeric value can be used as part of the value, providing a little extra numeric range in the column for these types. (There is no SIGNED optional attribute.)
ZEROFILL
Used for display formatting, this attribute tells MySQL that the numeric value is padded to its full size with zeros rather than spaces. ZEROFILL automatically forces the UNSIGNED attribute as well.

MySQL also enforces a maximum display size for columns of up to 255 characters. Columns longer than 255 characters are stored properly, but only 255 characters are displayed. Floating-point numeric data types may have a maximum of 30 digits after the decimal point.

The following list enumerates the data types MySQL supports. These include most of the ANSI/ISO SQL data types, plus several additional data types used to contain lists of values, as well as data types used for binary large objects (BLOBs). Data types that extend the ANSI/ISO standard include TEXT, ENUM, SET, and MEDIUMINT. Special data type attributes that go beyond the ANSI/ISO standard include AUTO_INCREMENT, BINARY, FIXED, NULL, UNSIGNED, and ZEROFILL. The data types supported by MySQL are:

BIGINT[(n)] [UNSIGNED] [ZEROFILL] (ANSI/ISO SQL data type: BIGINT)
Stores signed or unsigned integers. The signed range is −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. The unsigned range is 0 to 18,446,744,073,709,551,615. BIGINT may perform imprecise calculations with very large numbers (63 bits), due to rounding issues.
BINARY[(n)] (ANSI/ISO SQL data type: BLOB)
Stores binary byte strings of optional length n. Otherwise, similar to the BLOB data type.
BIT [(n)], BOOL, BOOLEAN (ANSI/ISO SQL data type: BOOLEAN)
Synonyms for TINYINT, usually used to store only 0’s or 1’s. N specifies the number of bits from 1 to 64. If N is omitted, the default is 1 bit.
BLOB (ANSI/ISO SQL data type: BLOB)
Stores up to 65,535 characters of data. Support for indexing BLOB columns is found only in MySQL version 3.23.2 or greater (this feature is not found in any other platform covered in this book). In MySQL, BLOBs are functionally equivalent to the MySQL data type VARCHAR BINARY (discussed later) with the default upper limit. BLOBs always require case-sensitive comparisons. BLOB columns differ from MySQL VARCHAR BINARY columns by not allowing DEFAULT values. You cannot perform a GROUP BY or ORDER BY on BLOB columns. Depending on the storage engine being used, BLOBs also are sometimes stored separately from their tables, whereas all other data types in MySQL (with the exception of TEXT) are stored in the table file structure itself.
CHAR(n) [BINARY], CHARACTER(n) [BINARY] (ANSI/ISO SQL data type: CHARACTER(n))
Contains a fixed-length character string of 1 to 255 characters. CHAR pads with blank spaces when it stores values but trims spaces upon retrieval, just as ANSI ANSI/ISO SQL VARCHAR does. The BINARY option allows binary searches rather than dictionary-order, case-insensitive searches.
DATE (ANSI/ISO SQL data type: DATE)
Stores a date within the range of 1000-01-01 to 9999-12-31 (delimited by quotes). MySQL displays these values by default in the format YYYY-MM-DD, though the user may specify some other display format.
DATETIME (ANSI/ISO SQL data type: TIMESTAMP)
Stores date and time values within the range of 1000-01-01 00:00:00 to 9999-12-31 23:59:59.
DECIMAL[(p[,s])] [UNSIGNED] [ZEROFILL], DEC[(p[,s])] [UNSIGNED] [ZEROFILL], FIXED [(p[,s])] [UNSIGNED] [ZEROFILL] ANSI/ISO SQL data type: DECIMAL(PRECISION, SCALE))
Stores exact numeric values as if they were strings, using a single character for each digit, up to 65 digits in length. Precision is 10 if omitted, and scale is 0 if omitted. FIXED is a synonym for DECIMAL provided for backward compatibility with other database platforms.
DOUBLE[(p,s)] [ZEROFILL], DOUBLE PRECISION[(p,s)] [ZEROFILL] (ANSI/ISO SQL data type: DOUBLE PRECISION)
Holds double-precision numeric values and is otherwise identical to the double-precision FLOAT data type, except for the fact that its allowable range is −1.7976931348623157E+308 to −2.2250738585072014E-308, 0, and 2.2250738585072014E-308 to 1.7976931348623157E+308.
ENUM(“val1,” “val2,” . . . n) [CHARACTER SET cs_name] [COLLATE collation_name] (ANSI/ISO SQL data type: none)
Holds a list of allowable values (expressed as strings but stored as integers). Other possible values for the data type are NULL, or an empty string (““) as an error value. Up to 65,535 distinct values are allowed.
FLOAT[(p[,s])] [ZEROFILL] (ANSI/ISO SQL data type: FLOAT(P))
Stores floating-point numbers in the range −3.402823466E+38 to −1.175494351E-38 and 1.175494351E-38 to 3.402823466E+38. FLOAT without a precision, or with a precision of <= 24, is single-precision. Otherwise, FLOAT is double-precision. When specified alone, the precision can range from 0 to 53. When you specify both precision and scale, the precision may be as high as 255 and the scale may be as high as 253. All FLOAT calculations in MySQL are done with double precision and may, since FLOAT is an approximate data type, encounter rounding errors.
INT[EGER][(n)] [UNSIGNED] [ZEROFILL] [AUTO_INCREMENT] (ANSI/ISO SQL data type: INTEGER)
Stores signed or unsigned integers. For ISAM tables, the signed range is from −2,147,483,648 to 2,147,483,647 and the unsigned range is from 0 to 4,294,967,295. The range of values varies slightly for other types of tables. AUTO_INCREMENT is available to all of the INT variants; it creates a unique row identity for all new rows added to the table. (Refer to the section “CREATE/ALTER DATABASE Statement” in Chapter 3 for more information on AUTO_INCREMENT.)
LONGBLOB (ANSI/ISO SQL data type: BLOB)
Stores BLOB data up to 4,294,967,295 characters in length. Note that this might be too much information for some client/server protocols to support.
LONGTEXT [CHARACTER SET cs_name] [COLLATE collation_name] (ANSI/ISO SQL data type: CLOB)
Stores TEXT data up to 4,294,967,295 characters in length (less if the characters are multibyte). Note that this might be too much data for some client/server protocols to support.
MEDIUMBLOB (ANSI/ISO SQL data type: none)
Stores BLOB data up to 16,777,215 bytes in length. The first three bytes are consumed by a prefix indicating the total number of bytes in thevalue.
MEDIUMINT[(n)] [UNSIGNED] [ZEROFILL] (ANSI/ISO SQL data type: none)
Stores signed or unsigned integers. The signed range is from 8,388,608 to −8,388,608, and the unsigned range is 0 to 16,777,215.
MEDIUMTEXT [CHARACTER SET cs_name] [COLLATE collation_name] (ANSI/ISO SQL data type: none)
Stores TEXT data up to 16,777,215 characters in length (less if the characters are multibyte). The first three bytes are consumed by a prefix indicating the total number of bytes in the value.
NCHAR(n) [BINARY], [NATIONAL] CHAR(n) [BINARY] (ANSI/ISO SQL data type: NCHAR(n))
Synonyms for CHAR. The NCHAR data types provide UNICODE support beginning in MySQL v4.1.
NUMERIC(p,s) (ANSI/ISO SQL data type: DECIMAL(p,s))
Synonym for DECIMAL.
NVARCHAR(n) [BINARY], [NATIONAL] VARCHAR(n) [BINARY], NATIONAL CHARACTER VARYING(n) [BINARY] (ANSI/ISO SQL data type: NCHAR VARYING)
Synonyms for VARYING [BINARY]. Hold variable-length character strings up to 255 characters in length. Values are stored and compared in a case-insensitive fashion unless the BINARY keyword is used.
REAL(p,s) (ANSI/ISO SQL data type: REAL)
Synonym for DOUBLE PRECISION.
SERIAL
Synonym for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE. SERIAL is useful as an auto-incrementing primary key.
SET(“val1,” “val2,” . . . n) [CHARACTER SET cs_name] [COLLATE collation_name] (ANSI/ISO SQL data type: none)
A CHAR data type whose value must be equal to zero or more values specified in the list of values. Up to 64 items are allowed in the list of values.
SMALLINT[(n)] [UNSIGNED] [ZEROFILL] (ANSI/ISO SQL data type: SMALLINT)
Stores signed or unsigned integers. The signed range is from −32,768 to 32,767, and the unsigned range is from 0 to 65,535.
TEXT (ANSI/ISO SQL data type: none)
Stores up to 65,535 characters of data. TEXT data types are sometimes stored separately from their tables, depending on the storage engine used, whereas all other data types (with the exception of BLOB) are stored in their respective table file structures. TEXT is functionally equivalent to VARCHAR with no specific upper limit (besides the maximum size of the column), and it requires case-insensitive comparisons. TEXT differs from a standard VARCHAR column by not allowing DEFAULT values. TEXT columns cannot be used in GROUP BY or ORDER BY clauses. In addition, support for indexing TEXT columns is provided only in MySQL version 3.23.2 and greater.
TIME (ANSI/ISO SQL data type: none)
Stores time values in the range of 838:59:59 to 838:59:59, in the format HH:MM:SS. The values may be assigned as strings or numbers.
TIMESTAMP (ANSI/ISO SQL data type: TIMESTAMP)
Stores date values in the range of 1970-01-01 00:00:01 to partway through the year 2038. The values are expressed as the number of seconds since 1970-01-01 00:00:01. Timestamp values are always displayed in the format YYYY-MM-DD HH:MM:SS.
TINYBLOB (ANSI/ISO SQL data type: BLOB)
Stores BLOB values of up to 255 bytes, the first byte being consumed by a prefix indicating the total number of bytes in the value.
TINYINT[(n)] [UNSIGNED] [ZEROFILL] (ANSI/ISO SQL data type: INTEGER)
Stores very small signed or unsigned integers ranging from -128 to 127, if signed, and from 0 to 255 if unsigned.
TINYTEXT (ANSI/ISO SQL data type: none)
Stores TEXT values of up to 255 characters (less if they are multibyte characters). The first byte is consumed by a prefix indicating the total number of bytes in the value.
VARBINARY(n) (ANSI/ISO SQL data type: BLOB)
Stores variable-length binary byte strings of length n. Otherwise, similar to the VARCHAR data type.
YEAR (ANSI/ISO SQL data type: none)
Stores the year in a two- or four-digit (the default) format. Two-digit years allow values of 70 to 69, meaning 1970 to 2069, while four-digit years allow values of 1901 to 2155, plus 0000. YEAR values are always displayed in YYYY format but may be assigned as strings or numbers.

Oracle Datatypes

As you’ll see in this section, Oracle supports a rich variety of data types, including most of the SQL data types and some special data types. The special data types, however, often require optional components to be installed. For example, Oracle supports spatial data types, but only if you have installed the Oracle Spatial add-on. The Oracle Spatial data types, including SDO_GEOMETRY, SDO_TOPO_GEOMETRY, and SDO_GEORASTER, are beyond the scope of this book. Refer to the Oracle Spatial documentation for further details on these types.

Oracle Multimedia data types use object types, similar to Java or C++ classes for multimedia data. Oracle Multimedia data types include ORDAudio, ORDImage, ORDVideo, ORDDoc, ORDDicom, SI_Stillimage, SI_Color, SI_AverageColor, SI_ColorHistogram, SI_PositionalColor, SI_Texture, SI_FeatureList, and ORDImageSignature.

Oracle also supports “Any Types” data types. These highly flexible data types are intended for use as procedure parameters and as table columns where the actual type is unknown. The Any Type data types are ANYTYPE, ANYDATA, and ANYDATASET.

A complete listing of the Oracle data types follows:

BFILE (ANSI/ISO SQL data type: DATALINK)
Holds a pointer to a BLOB stored outside the database, but present on the local server, of up to 4 GB in size. The database streams input (but not output) access to the external BLOB. If you delete a row containing a BFILE value, only the pointer value is deleted; the actual file structure is not deleted.
BINARY_DOUBLE (ANSI/ISO SQL data type: FLOAT)
Holds a 64-bit floating-point number.
BINARY_FLOAT (ANSI/ISO SQL data type: FLOAT)
Holds a 32-bit floating-point number.
BLOB (ANSI/ISO SQL data type: BLOB)
Holds a binary large object (BLOB) value of between 8 and 128 terabytes in size, depending on the database block size. In Oracle, large binary objects (BLOBs, CLOBs, and NCLOBs) have the following restrictions:
  • They cannot be selected remotely.
  • They cannot be stored in clusters.
  • They cannot compose a varray.
  • They cannot be a component of an ORDER BY or GROUP BY clause in a query.
  • They cannot be used by an aggregate function in a query.
  • They cannot be referenced in queries using DISTINCT, UNIQUE, or joins.
  • They cannot be referenced in ANALYZE . . . COMPUTE or ANALYZE . . . ESTIMATE statements.
  • They cannot be part of a primary key or index key.
  • They cannot be used in the UPDATE OF clause in an UPDATE trigger.
CHAR(n) [BYTE | CHAR], CHARACTER(n) [BYTE | CHAR] (ANSI/ISO SQL data type: CHARACTER(n))
Holds fixed-length character data of up to 2,000 bytes in length. BYTE tells Oracle to use bytes for the size measurement. CHAR tells Oracle to use characters for the size measurement.
CLOB (ANSI/ISO SQL data type: CLOB)
Stores a character large object (CLOB) value of between 8 and 128 terabytes in size, depending on the database block size. See the description of the BLOB data type for a list of restrictions on the use of the CLOB type.
DATE (ANSI/ISO SQL data type: DATE)
Stores a valid date and time within the range of 4712BC-01-01 00:00:00 to 9999AD-12-31 23:59:59.
DECIMAL(p,s) (ANSI/ISO SQL data type: DECIMAL(p,s))
A synonym for NUMBER that accepts precision and scale arguments.
DOUBLE PRECISION (ANSI/ISO SQL data type: DOUBLE PRECISION)
Stores floating-point values with double precision, the same as FLOAT(126).
FLOAT(n) (ANSI/ISO SQL data type: FLOAT(n))
Stores floating-point numeric values with a binary precision of up to 126.
INTEGER(n) (ANSI/ISO SQL data type: INTEGER)
Stores signed and unsigned integer values with a precision of up to 38. INTEGER is treated as a synonym for NUMBER.
INTERVAL DAY(n) TO SECOND(x) (ANSI/ISO SQL data type: INTERVAL)
Stores a time span in days, hours, minutes, and seconds, where n is the number of digits in the day field (values from 0 to 9 are acceptable, and 2 is the default) and x is the number of digits used for fractional seconds in the seconds field (values from 0 to 9 are acceptable, and 6 is the default).
INTERVAL YEAR(n) TO MONTH (ANSI/ISO SQL data type: INTERVAL)
Stores a time span in years and months, where n is the number of digits in the year field. The value of n can range from0 to 9, with a default of 2.
LONG (ANSI/ISO SQL data type: none)
Stores variable-length character data of up to 2 gigabytes in size. Note, however, that LONG is not scheduled for long-term support by Oracle. Use another data type, such as CLOB, instead of LONG whenever possible.
LONG RAW (ANSI/ISO SQL data type: none)
Stores raw variable-length binary data of up to 2 gigabytes in size. LONG RAW and RAW are typically used to store graphics, sounds, documents, and other large data structures. BLOB is preferred over LONG RAW in Oracle, because there are fewer restrictions on its use. LONG RAW is deprecated.
NATIONAL CHARACTER VARYING(n), NATIONAL CHAR VARYING(n), NCHAR VARYING(n) (ANSI/ISO SQL data type: NCHAR VARYING (n))
Synonyms for NVARCHAR2.
NCHAR(n), NATIONAL CHARACTER(n), NATIONAL CHAR(n) (ANSI/ISO SQL data type: NATIONAL CHARACTER)
Holds UNICODE character data of 1 to 2,000 bytes in length. Default size is 1 byte.
NCLOB (ANSI/ISO SQL data type: NCLOB)
Represents a CLOB that supports multibyte and UNICODE values of between 8 and 128 terabytes in size, depending on the database block size. See the description of the BLOB data type for a list of restrictions on the use of the NCLOB type.
NUMBER(p,s), NUMERIC(p,s) (ANSI/ISO SQL data type: NUMERIC(p,s))
Stores a number with a precision of 1 to 38 and a scale of −84 to 127.
NVARCHAR2(n) (ANSI/ISO SQL data type: none)
Represents Oracle’s preferred UNICODE variable-length character data type. Can hold data of 1 to 4,000 bytes in size.
RAW(n) (ANSI/ISO SQL data type: none)
Stores raw, variable-length binary data of up to 2,000 bytes in size. The value n is the specified size of the data type. RAW is also deprecated in Oracle 11g. (See LONG RAW.)
REAL (ANSI/ISO SQL data type: REAL)
Stores floating-point values as single-precision. Same as FLOAT(63).
ROWID (ANSI/ISO SQL data type: none)
Represents a unique, base-64 identifier for each row in a table, often used in conjunction with the ROWID pseudocolumn.
SMALLINT (ANSI/ISO SQL data type: SMALLINT)
Synonym for INTEGER.
TIMESTAMP(n) {[WITH TIME ZONE] | [WITH LOCAL TIME ZONE]} (ANSI/ISO SQL data type: TIMESTAMP[WITH TIME ZONE])
Stores a full date and time value, where n is the number of digits (values from 0 to 9 are acceptable, and 6 is the default) in the fractional part of the seconds field. WITH TIME ZONE stores whatever time zone you pass to it (the default is your session time zone) and returns a time value in that same time zone. WITH LOCAL TIME ZONE stores data in the time zone of the current session and returns data in the time zone of the user’s session.
URITYPE (ANSI/ISO SQL data type: XML)
Stores a Uniform Resource Identifier (URI), operating much like a standard URL which references a document or even a specific point within a document. This data type is a supertype containing three subtypes, existing in an inheritance hierarchy: DBURIType, XDBURIType, and HTTPURIType. You would typically create a table using the URIType then store DBURITYPE (for DBURIREF values using an XPath nomenclature to reference data stored elsewhere in the database or another database), HTTPURITYPE (for HTTP web pages and files), or XDBURITYPE (for exposing documents in the XML database hierarchy) in the column. You will typically manipulate this type of data using the URIFactory Package. Refer to the vendor documentation for more information on the URIFactory Package.
UROWID[(n)] (ANSI/ISO SQL data type: none)
Stores a base-64 value showing the logical address of the row in its table. Defaults to 4,000 bytes in size, but you may optionally specify a size of anywhere up to 4,000 bytes.
VARCHAR(n), CHARACTER VARYING(n), CHAR VARYING(n) (ANSI/ISO SQL data type: CHARACTER VARYING(n))
Holds variable-length character data of 1 to 4,000 bytes in size.
Note

Oracle does not recommend using VARCHAR and has for many years instead encouraged the use of VARCHAR2.

VARCHAR2(n [BYTE | CHAR]) (ANSI/ISO SQL data type: CHARACTER VARYING(n))
Holds variable-length character data of up to 4,000 bytes in length, as defined by n. BYTE tells Oracle to use bytes forthe size measurement. CHAR tells Oracle to use characters for the size measurement. If you use CHAR, Oracle internally must still transform that into some number of bytes, which is then subject to the 4,000-byte upper limit.
XMLTYPE (ANSI/ISO SQL data type: XML)
Stores XML data within the Oracle database. The XML data is accessed using XPath expressions as well as a number of built-in XPath functions, SQL functions, and PL/SQL packages. The XMLTYPE data type is a system-defined type, so it is usable as an argument in functions, or as the data type of a column in a table or view. When used in a table, the data can be stored in a CLOB column or object-relationally.

PostgreSQL Data types

The PostgreSQL database supports most ANSI/ISO SQL data types, plus an extremely rich set of data types that store spatial and geometric data. PostgreSQL sports a rich set of operators and functions especially for the geometric data types, including capabilities such as rotation, finding intersections, and scaling. These have existed for a while and not that widely used since they pre-date standards for managing spatial data.

OpenGeospatial Standards compliant support is provided via an open source extension called PostGIS https://postgis.net, which is more commonly used than the built-in PostgreSQL geometric support. PostGIS sports both a geometry (flat-earth) and geography (round-earth) model as well as support for transforming between spatial projections. These types support numerous subtypes that can be expressed as typmodifiers e.g geometry(POLYGON,4326) for a polygon column storing WGS 84 long-lat. PostGIS also supports the newer SQL/MM standards which includes support for 3-dimensional types such as Triangular Irregular Networks (TINs) and PolyhedralSurfaces.

PostgreSQL also supports additional versions of existing data types that are smaller and take up less disk space than their corresponding primary data types. For example, PostgreSQL offers several variations on INTEGER to accommodate small or large numbers and thereby consume proportionally less or more space. Here’s a list of the data types it supports:

BIGINT, INT8 (ANSI/ISO SQL data type: none)
Stores signed or unsigned 8-byte integers within the range of −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
BIGSERIAL
See SERIALS.
BIT (ANSI/ISO SQL data type: BIT)
Stores a fixed-length bit string.
BIT VARYING(n), VARBIT(n) (ANSI/ISO SQL data type: BIT VARYING)
Stores a variable-length bit string whose length is denoted by n.
BOOL, BOOLEAN (ANSI/ISO SQL data type: BOOLEAN)
Stores a logical Boolean (true/false/unknown) value. The keywords TRUE and FALSE are preferred, but PostgreSQL supports the following valid literal values for the “true” state: TRUE, t, true, y, yes, and 1. Valid “false” values are: FALSE, f, false, n, no, and 0.
BOX( (x1, y1), (x2, y2) ) (ANSI/ISO SQL data type: none)
Stores the values of a rectangular box in a 2D plane. Values are stored in 32 bytes and are represented as ( (x1, y1), (x2, y2) ), signifying the opposite corners of the box (upper-right and lower-left corners, respectively). The outer parentheses are optional.
BYTEA (ANSI/ISO SQL data type: BINARY LARGE OBJECT)
Holds raw, binary data; typically used to store graphics, sounds, or documents. For storage, this data type requires 4 bytes plus the actual size of the binary string.
CHAR(n), CHARACTER(n) (ANSI/ISO SQL data type: CHARACTER(n))
Contains a fixed-length character string padded with spaces up to a length of n. Attempting to insert a value longer than n results in an error (unless the extra length is composed of spaces, which are then truncated such that the result fits in n characters).
CIDR(x.x.x.x/y) (ANSI/ISO SQL data type: none)
Describes an IP version 4 (IPv4) network or host address in a 12-byte storage space. The range is any valid IPv4 network address. Data in CIDR data types is represented as x.x.x.x/y, where the xs are the IP address and y is the number of bits in the netmask. CIDR does not accept nonzero bits to the right of a zero bit in the netmask.
CIRCLE(x, y, r) (ANSI/ISO SQL data type: none)
Describes a circle in a 2D plane. Values are stored in 24 bytes of storage space and are represented as (x, y, r). The x, y value represents the coordinates of the center of the circle, while r represents the length of the radius. Parentheses or arrow brackets may optionally delimit the values for x, y, and r.
DATE (ANSI/ISO SQL data type: DATE)
Holds a calendar date (year, day, and month) without the time of day in a4-byte storage space. Dates must be between 4713 BC and 32767 AD. DATE’s lowest resolution, naturally, is to the day.
DECIMAL[(p,s)], NUMERIC[(p,s)] (ANSI/ISO SQL data type: DECIMAL(p,s), NUMERIC(p,s))
Stores exact numeric values with a precision (p) in the range of 0 to 9 and a scale (s) of 0, with no upper limit.
FLOAT4, REAL (ANSI/ISO SQL data type: FLOAT(p))
Stores floating-point numbers with a precision of 0 to 8 and 6 decimal places.
FLOAT8, DOUBLE PRECISION (ANSI/ISO SQL data type: FLOAT(p), 7 <= p < 16)
Stores floating-point numbers with a precision of 0 to 16 and 15 decimal places.
INET(x.x.x.x/y) (ANSI/ISO SQL data type: none)
Stores an IP version 4 network or host address in a 12-byte storage space. The range is any valid IPv4 network address. The xs represent the IP address, and y is the number of bits in the netmask. The netmask defaults to 32. Unlike CIDR, INET accepts nonzero bits to the right of the netmask.
INTEGER, INT, INT4 (ANSI/ISO SQL data type: INTEGER)
Stores signed or unsigned 4-byte integers within the range of −2,147,483,648 to 2,147,483,647.
INTERVAL(p) (ANSI/ISO SQL data type: none)
Holds general-use time-span values within the range of −178,000,000 to 178,000,000 years in a 12-byte storage space. INTERVAL’s lowest resolution is to the microsecond. This is a different data type than the ANSI standard, which requires an interval qualifier such as INTERVAL YEAR TO MONTH.
JSON (SQL data type: json)
JSON data type stored as plain text. It maintains the fidelity of the data put in it and adds on JSON validation checking to prevent invalid JSON data.
JSONB (SQL data type: json)
JSON data type stored as binary. JSONB has richer support for indexing than JSON and is more compact and faster to pull sub-elements of JSON. This is the preferred data type for storing JSON data. Unlike JSON, data added to it is restored for more efficient query handling and does not allow duplication of keys. In case of duplicates, the last value wins. As such you will find it may not match exactly what you inserted into it, so not suitable if you need to maintain the exactness of what was inserted.
LINE( ( x1, y1), (x2, y2) ) (ANSI/ISO SQL data type: none)
Holds line data, without endpoints, in 2D plane values. Values are stored in 32 bytes and are represented as ( (x1, y1), (x2, y2) ), indicating the start and end points of a line. The enclosing parentheses are optional for line syntax.
LSEG( ( x1, y1), (x2, y2) ) (ANSI/ISO SQL data type: none)
Holds line segment (LSEG) data, with endpoints, in a 2D plane. Values are stored in 32 bytes and are represented as ( (x1, y1), (x2, y2) ). The outer parentheses are optional for LSEG syntax. For those who are interested, the “line segment” is what most people traditionally think of as a line. For example, the lines on a playing field are actually line segments.
Note

In true geometric nomenclature, a line stretches to infinity, having no terminus at either end, while a line segment has end points. PostgreSQL has data types for both, but they are functionally equivalent.

MACADDR (ANSI/ISO SQL data type: none)

Holds a value for the MAC address of a computer’s network interface card in a 6-byte storage space. MACADDR accepts a number of industry standard representations, such as:

08002B:010203

08002B-010203

0800.2B01.0203

08-00-2B-01-02-03

08:00:2B:01:02:03

MONEY, DECIMAL(9,2) (ANSI/ISO SQL data type: none)
Stores U.S.-style currency values in the range of −21,474,836.48 to 21,474,836.47.
NUMERIC[(p,s)], DECIMAL[(p,s)](ANSI/ISO SQL data type: none)
Stores exact numeric values with a precision (p) and scale (s).
OID (ANSI/ISO SQL data type: none)
Stores unique object identifiers.
PATH((x1, y1), . . . n), PATH[(x1, y1), . . . n] (ANSI/ISO SQL data type: none)
Describes an open and closed geometric path in a 2D plane. Values are represented as [(x1, y1), . . . n] and consume 4 + 32n bytes of storage space. Each (x, y) value represents a point on the path. Paths are either open, where the first and last points do not intersect, or closed, where the first and last points do intersect. Parentheses are used to encapsulate closed paths, while brackets encapsulate open paths.
POINT(x, y) (ANSI/ISO SQL data type: none)
Stores values for a geometric point in a 2D plane in a 16-byte storage space. Values are represented as (x, y). The point is the basis for all other two-dimensional spatial data types supported in PostgreSQL. Parentheses are optional for point syntax.
POLYGON( (x1, y1), . . . n ) (ANSI/ISO SQL data type: none)
Stores values for a closed geometric path in a 2D plane using 4 + 32n bytes of storage. Values are represented as ( (x1,y1), . . . n ); the enclosing parentheses are optional. POLYGON is essentially a closed-path data type.
SERIAL, SERIAL4 (ANSI/ISO SQL data type: none)
Stores an autoincrementing, unique integer ID for indexing and cross-referencing. Can contain up to 4 bytes of data (a range of numbers from 1 to 2,147,483,647). Tables defined with this data type cannot be directly dropped: you must first issue the DROP SEQUENCE command, then follow up with the DROP TABLE command.
SERIAL8, BIGSERIAL (ANSI/ISO SQL data type: none)
Stores an autoincrementing, unique integer ID for indexing and cross-referencing. Can contain up to 8 bytes of data (a range of numbers from 1 to 9,223,372,036,854,775,807). Tables defined with this data type cannot be directly dropped: you must first issue the DROP SEQUENCE command, then follow up with the DROP TABLE command.
SMALLINT (ANSI/ISO SQL data type: SMALLINT)
Stores signed or unsigned 2-byte integers within the range of −32,768 to 32,767. INT2 is a synonym.
TEXT (ANSI/ISO SQL data type: CLOB)
Stores large, variable-length character-string data of up to 1 gigabyte. PostgreSQL automatically compresses TEXT strings, so the disk size may be less than the string size.
TIME[(p)] [WITHOUT TIME ZONE | WITH TIME ZONE] (ANSI/ISO SQL data type: TIME)
Holds the time of day and stores either no time zone (using 8 bytes of storage space) or the time zone of the database server (using 12 bytes of storage space). The allowable range is from 00:00:00.00 to 23:59:59.99. The lowest granularity is 1 microsecond. Note that time zone information on most Unix systems is available only for the years 1902 through 2038.
TIMESTAMP[(p)] [WITHOUT TIME ZONE | WITH TIME ZONE] (ANSI/ISO SQL data type: TIMESTAMP [WITH TIME ZONE | WITHOUT TIME ZONE])
Holds the date and time and stores either no time zone or the time zone of the database server. The range of values is from 4713 BC to 1465001 AD. TIMESTAMP uses 8 bytes of storage space per value. The lowest granularity is 1 microsecond. Note that time zone information on most Unix systems is available only for the years 1902 through 2038.
TIMETZ (ANSI/ISO SQL data type: TIME WITH TIME ZONE)
Holds the time of day, including the time zone.
TSQUERY (ANSI/ISO SQL data type: none)__
Used for full text search is a textual way of defining a full text query that is then applied to a TSVECTOR.
TSVECTOR (ANSI/ISO SQL: none)__
Used for full text search is a binary format consisting of lexemes and frequency.
VARCHAR(n), CHARACTER VARYING(n) (ANSI/ISO SQL data type: CHARACTER VARYING(n))
Stores variable-length character strings of up to a length of n. Trailing spaces are not stored.

SQL Server Data Types

Microsoft SQL Server supports most ANSI/ISO SQL data types, as well as some additional data types used to uniquely identify rows of data within a table and across multiple servers, such as UNIQUEIDENTIFIER. These data types are included in support of Microsoft’s hardware philosophy of “scale-out” (that is, deploying on many Intel-based servers) rather than “scale-up” (deploying on a single huge, high-end Unix server or a Windows Data Center Server).

Similar to the other databases, SQL Server has OpenGeospatial support. It is most similar to PostGIS in how it implements these types - with a dedicated geometry type for flat-earth model and geography type for round-earth. It has the richest support for curved geometries and round-earth than any of the other databases discussed in this book, but lacks spatial reprojection support that both PostGIS and Oracle offer that is commonly needed for GIS work.

Note

Here’s an interesting side note about SQL Server dates: SQL Server supports dates starting at the year 1753, and you can’t store dates prior to that year using any of SQL Server’s date data types. Why not? The rationale is that the English-speaking world started using the Gregorian calendar in 1753 (the Julian calendar was used prior to September, 1753), and converting dates prior to Julian to the Gregorian calendar can be quite challenging.

The data types SQL Server supports are:

BIGINT (ANSI/ISO SQL data type: BIGINT)
Stores signed and unsigned integers in the range of −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807, using 8 bytes of storage space. See INT for IDENTITY property rules that also apply to BIGINT.
BINARY[(n)] (ANSI/ISO SQL data type: BLOB)
Stores a fixed-length binary value of 1 to 8,000 bytes in size. BINARY data types consume n + 4 bytes of storage space.
BIT (ANSI/ISO SQL data type: BOOLEAN)
Stores a value of 1, 0, or NULL (to indicate “unknown”). Up to eight BIT columns on a single table will be stored in a single byte. An additional eight BIT columns consume one more byte of storage space. BIT columns cannot be indexed.
CHAR[(n)], CHARACTER[(n)] (ANSI/ISO SQL data type: CHARACTER(n))
Holds fixed-length character data of 1 to 8,000 characters in length. Any unused space is, by default, padded with spaces. (You can disable the automatic padding.) Storage size is n bytes.
CURSOR (ANSI/ISO SQL data type: none)
A special data type used to describe a cursor as a variable or stored procedure OUTPUT parameter. It cannot be used in a CREATE TABLE statement. The CURSOR data type is always nullable.
DATE (ANSI/ISO SQL data type: DATE)
Holds a date in the range of January 1, 0001 AD to December 31, 9999 AD.
DATETIME (ANSI/ISO SQL data type: TIMESTAMP)
Holds a date and time within the range of 1753-01-01 00:00:00 through 9999-12-31 23:59:59. Values are stored in an 8-byte storage space.
DATETIME2 (ANSI/ISO SQL data type: TIMESTAMP)
Holds a date and time within the range of January 1, 0001 AD to December 31, 9999 AD, to an accuracy of 100 nanoseconds.
DATETIMEOFFSET (ANSI/ISO SQL data type: TIMESTAMP)
Holds a date and time within the range of January 1, 0001 AD to December 31, 9999 AD, to an accuracy of 100 nanoseconds. Also includes time zone information. Values are stored in a 10-byte storage space.
DECIMAL(p,s), DEC(p,s), NUMERIC(p,s) (ANSI/ISO SQL data type: DECIMAL(p,s), NUMERIC(p,s))

Stores decimal values up to 38 digits long. The values p and s define the precision and scale, respectively. The default value for the scale is 0. The precision of the data type determines how much storage space it will consume:

Precision 1-9 uses 5 bytes

Precision 10-19 uses 9 bytes

Precision 20-28 uses 13 bytes

Precision 29-39 uses 17 bytes

See INT for IDENTITY property rules that also apply to DECIMAL

DOUBLE PRECISION (ANSI/ISO SQL data type: none)
Synonym for FLOAT(53).
FLOAT[(n)] (ANSI/ISO SQL data type: FLOAT, FLOAT(n))
Holds floating-point numbers in the range of −1.79E+308 through 1.79E +308. The precision, represented by n, may be in the range of 1 to 53. The storage size is 4 bytes for 7 digits, where n is in the range of 1 to 24. Anything larger requires 8 bytes of storage.
HIERARCHYID (ANSI/ISO SQL data type: none)
Represents a hierarchy or tree structure within the relational data. Although it may consume more space, HIERARCHYID will usually consume 5 bytes or less. Refer to the vendor documentation for more information on this special data type.
IMAGE (ANSI/ISO SQL data type: BLOB)
Stores a variable-length binary value of up to 2,147,483,647 bytes in length. This data type is commonly used to store graphics, sounds, and files such as MS-Word documents and MS-Excel spreadsheets. IMAGE cannot be freely manipulated; both IMAGE and TEXT columns have a lot of constraints on how they can be used. See TEXT for a list of the commands and functions that work on an IMAGE data type.
INT [IDENTITY [ (seed, increment)] (ANSI/ISO SQL data type: INTEGER)
Stores signed or unsigned integers within the range of −2,147,483,648 to 2,147,483,647 in 4 bytes of storage space. All integer data types, as well as the decimal type, support the IDENTITY property. An identity is an automatically incrementing row identifier. Refer to the section “CREATE/ALTER DATABASE Statement” in Chapter 3 for more information.
MONEY (ANSI/ISO SQL data type: none)
Stores monetary values within the range of −922,337,203,685,477.5808 to 922,337,203,685,477.5807, in an 8-byte storage space.
NCHAR(n), NATIONAL CHAR(n), NATIONAL CHARACTER(n) (ANSI/ISO SQL data type: NATIONAL CHARACTER(n))
Holds fixed-length UNICODE data of up to 4,000 characters in length. The storage space consumed is double the character length inserted into the field (2 * n).
NTEXT, NATIONAL TEXT (ANSI/ISO SQL data type: NCLOB)
Holds UNICODE text passages of up to 1,073,741,823 characters in length. See TEXT for rules about the commands and functions available for NTEXT.
NUMERIC(p,s) (ANSI/ISO SQL data type: DECIMAL(p,s))
Synonym for DECIMAL. See INT for rules about the IDENTITY property that also apply to this type.
NVARCHAR(n), NATIONAL CHAR VARYING(n), NATIONAL CHARACTER VARYING(n) (ANSI/ISO SQL data type: NATIONAL CHARACTER VARYING(n))
Holds variable-length UNICODE data of up to 4,000 characters in length. The storage space consumed is double the character length inserted into the field (2 * n). The system setting SET ANSI_PADDING is always enabled (ON) for NCHAR and NVARCHAR fields in SQL Server.
REAL, FLOAT(24) (ANSI/ISO SQL data type: REAL)
Holds floating-point numbers in the range of −3.40E+38 through 3.40E +38 in a 4-byte storage space. REAL is functionally equivalent to FLOAT(24).
ROWVERSION (ANSI/ISO SQL data type: none)
Stores a number that is unique within the database whenever a row in the table is updated. Called TIMESTAMP in earlier versions.
SMALLDATETIME (ANSI/ISO SQL data type: none)
Holds a date and time within the range of 1900-01-01 00:00 through 2079-06-06 23:59, accurate to the nearest minute. (Minutes are rounded down when seconds are 29.998 or less; otherwise, they are rounded up.) Values are stored in 4 bytes.
SMALLINT (ANSI/ISO SQL data type: SMALLINT)
Stores signed or unsigned integers in the range of −32,768 and 32,767, in 2 bytes of storage space. See INT for rules about the IDENTITY property that also apply to this type.
SMALLMONEY (ANSI/ISO SQL data type: none)
Stores monetary values within the range of −214,748.3648 to 214,748.3647, in 4 bytes of storage space.
SQL_VARIANT (ANSI/ISO SQL data type: none)
Stores values of other SQL Server-supported data types, except TEXT, NTEXT, ROWVERSION, and other SQL_VARIANT commands. Can store up to 8,016 bytes of data and supports NULL and DEFAULT values. SQL_VARIANT is used in columns, parameters, variables, and return values of functions and stored procedures.
TABLE (ANSI/ISO SQL data type: none)
Special data type that stores a result set for a later process. Used solely in procedural processing, and cannot be used in a CREATE TABLE statement. This data type alleviates the need for temporary tables in many applications. It can reduce the need for stored procedure recompiles, thus speeding execution of stored procedures and user-defined functions.
TEXT (ANSI/ISO SQL data type: CLOB)
Stores very large passages of text (up to 2,147,483,647 characters in length). TEXT and IMAGE values are often more difficult to manipulate than, say, VARCHAR values. For example, you cannot place an index on a TEXT or IMAGE column. TEXT values valuescan can be manipulated using the functions DATALENGTH, PATINDEX, SUBSTRING, TEXTPTR, and TEXTVALID as well as the commands READTEXT, SET TEXTSIZE, UPDATETEXT, and WRITETEXT.
TIME (ANSI/ISO SQL data type: TIME)
Stores an automatically generated binary number that guarantees uniqueness in the current database and is therefore different from the ANSI TIMESTAMP data type. TIME s consume 8 bytes of storage space. ROWVERSION is now preferred over TIME to uniquely track each row.
TIMESTAMP (ANSI/ISO SQL data type: TIMESTAMP)
Stores the time of day based on a 24-hour clock without time zone awareness, to an accuracy of 100 nanoseconds, in a 5-byte storage space.
TINYINT (ANSI/ISO SQL data type: none)
Stores unsigned integers within the range 0 to 255 in 1 byte of storage space. See INT for rules about the IDENTITY property that also apply to this type.
UNIQUEIDENTIFIER (ANSI/ISO SQL data type: none)
Represents a value that is globally unique across all databases and all servers. Values are represented as xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxx, where each x is a hexadecimal digit in the range 0 to 9 or a to f. The only operations allowed against UNIQUEIDENTIFIER s are comparisons and NULL checks. Column constraints and properties are allowed on UNIQUEIDENTIFIER columns, with the exception of the IDENTITY property.
VARBINARY[(n)] (ANSI/ISO SQL data type: BLOB)
Describes a variable-length binary value of up to 8,000 bytes in size. The storage space consumed is equivalent to the size of the data inserted, plus 4 bytes.
VARCHAR[(n)], CHAR VARYING[(n)], CHARACTER VARYING[(n)] (ANSI/ISO SQL data type: CHARACTER VARYING(n))
Holds fixed-length character data of 1 to 8,000 characters in length. The amount of storage space required is determined by the actual size of the value entered in bytes, not the value of n.
XML (ANSI/ISO SQL data type: XML)
Stores XML data in a column or a variable of variable size in storage space up to but not exceeding 2 gigabytes in size.

Constraints

Constraints allow you to automatically enforce the rules of data integrity and to filter the data that is placed in a database. In a sense, constraints are rules that define which data values are valid during INSERT, UPDATE, and DELETE operations. When a data-modification transaction breaks the rules of a constraint, the transaction is rejected.

In the ANSI standard, there are four constraint types: CHECK, PRIMARY KEY, UNIQUE, and FOREIGN KEY. (The RDBMS platforms may allow more; refer to Chapter 3 for details.)

Scope

Constraints may be applied at the column level or the table level:

Column-level constraints
Are declared as part of a column definition and apply only to that column.
Table-level constraints
Are declared independently from any column definitions (traditionally, at the end of a CREATE TABLE statement) and may apply to one or more columns in the table. A table constraint is required when you wish to define a constraint that applies to more than one column.

Syntax

Constraints are defined when you create or alter a table. The general syntax for constraints is shown here:

        CONSTRAINT [constraint_name] constraint_type [(column [, ...])]
        [predicate] [constraint_deferment] [deferment_timing]
        

The syntax elements are as follows:

CONSTRAINT [constraint_name]
Begins a constraint definition and, optionally, provides a name for the constraint. When you omit constraint_name, the system will create a name for you automatically. On some platforms, you may omit the CONSTRAINT keyword as well.
Note

System-generated names are often incomprehensible. It is good practice to specify human-readable, sensible names for constraints.

constraint_type
Declares the constraint as one of the allowable types: CHECK, PRIMARY KEY, UNIQUE, or FOREIGN KEY. More information about each type of constraint appears later in this section.
column [, . . . ]
Associates one or more columns with the constraint. Specify the columns in a comma-delimited list, enclosed in parentheses. The column list should be omitted for column-level constraints. Columns are not used in every constraint. For example, CHECK constraints do not generally use column references.
predicate
Defines a predicate for CHECK constraints.
constraint_deferment
Declares a constraint as DEFERRABLE or NOT DEFERRABLE. When a constraint is deferrable, you can specify that it be checked for a rules violation at the end of a transaction. When a constraint is not deferrable, it is checked for a rules violation at the conclusion of every SQL statement.
deferment_timing
Declares a deferrable constraint as INITIALLY DEFERRED or INITIALLY IMMEDIATE. When set to INITIALLY DEFERRED, the constraint check time will be deferred until the end of a transaction, even if the transaction is composed of many SQL statements. In this case, the constraint must also be DEFERRABLE. When set to INITIALLY IMMEDIATE, the constraint is checked at the end of every SQL statement. In this case, the constraint may be either DEFERRABLE or NOT DEFERRABLE. The default is INITIALLY IMMEDIATE.

Note that this syntax may vary among the different vendor platforms. Check the individual platform sections in Chapter 3 for more details.

PRIMARY KEY Constraints

A PRIMARY KEY constraint declares one or more columns whose value(s) uniquely identify each record in the table. It is considered a special case of the UNIQUE constraint. Here are some rules about primary keys:

  • Only one primary key may exist on a table at a time.
  • Columns in the primary key cannot have data types of BLOB, CLOB, NCLOB, or ARRAY.
  • Primary keys may be defined at the column level for a single column key or at the table level if multiple columns make up the primary key.
  • Values in the primary key column(s) must be unique and not NULL.
  • In a multicolumn primary key, called a concatenated key, the combination of values in all of the key columns must be unique and not NULL.
  • Foreign keys can be declared that reference the primary key of a table to establish direct relationships between tables (or possibly, though rarely, within a single table).

The following ANSI standard code includes the options for creating both a table-and column-level primary key constraint on a table called distributors. The first example shows a column-level primary-key constraint, while the second shows a table-level constraint:

        -- Creating a column-level constraint
        CREATE TABLE distributors(dist_id CHAR(4) NOT NULL PRIMARY KEY,
         dist_name VARCHAR(40),
         dist_address1 VARCHAR(40),
         dist_address2 VARCHAR(40),
         city VARCHAR(20),
         state CHAR(2) ,
         zip CHAR(5) ,
         phone CHAR(12) ,
         sales_rep INT );
        -- Creating a table-level constraint
        CREATE TABLE distributors
         (dist_id CHAR(4) NOT NULL,
         dist_name VARCHAR(40),
         dist_address1 VARCHAR(40),
         dist_address2 VARCHAR(40),
         city VARCHAR(20),
         state CHAR(2) ,
         zip CHAR(5) ,
         phone CHAR(12) ,
         sales_rep INT ,CONSTRAINT pk_dist_id PRIMARY KEY (dist_id));
      

In the example showing a table-level primary key, we could easily have created a concatenated key by listing several columns separated by commas.

FOREIGN KEY Constraints

A FOREIGN KEY constraint defines one or more columns in a table as referencing columns in a unique or primary key in another table. (A foreign key can reference a unique or primary key in the same table as the foreign key itself, but such foreign keys are rare.) Foreign keys can then prevent the entry of data into a table when there is no matching value in the related table. They are the primary means of identifying the relationships between tables in a relational database. Here are some rules about foreign keys:

  • Many foreign keys may exist on a table at a time.
  • A foreign key can be declared to reference either the primary key or a unique key of another table to establish a direct relationship between the two tables.

The full ANSI/ISO SQL syntax for foreign keys is more elaborate than the general syntax for constraints shown earlier, and it’s dependent on whether you are making a table-level or column-level declaration:

        -- Table-level foreign key
        [CONSTRAINT [constraint_name] ]
        FOREIGN KEY (local_column[, ...] )
        REFERENCES referenced_table [ (referenced_column[, ...]) ]
        [MATCH {FULL | PARTIAL | SIMPLE} ]
        [ON UPDATE {NO ACTION | CASCADE | RESTRICT |
         SET NULL | SET DEFAULT} ]
        [ON DELETE {NO ACTION | CASCADE | RESTRICT |
         SET NULL | SET DEFAULT} ]
        [constraint_deferment] [deferment_timing]
        -- Column-level foreign key
        [CONSTRAINT [constraint_name] ]
        REFERENCES referenced_table [ (referenced_column[, ...]) ]
        [MATCH {FULL | PARTIAL | SIMPLE} ]
        [ON UPDATE {NO ACTION | CASCADE | RESTRICT |
         SET NULL | SET DEFAULT} ]
        [ON DELETE {NO ACTION | CASCADE | RESTRICT |
         SET NULL | SET DEFAULT} ]
        [constraint_deferment] [deferment_timing]
      

The keywords common to a standard constraint declaration were described earlier, in the “Syntax” section. Keywords specific to foreign keys are described in the following list:

FOREIGN KEY (local_column [, . . . ])
Declares one or more columns of the table being created or altered that are subject to the foreign key constraint. This syntax is used only in table-level declarations and is excluded from column-level declarations. We recommend that the ordinal positions and data types of the columns in the local_column list match the ordinal positions and data types of the columns in the refer enced_column list.
REFERENCES referenced_table [ ( referenced_column [, . . . ]) ]
Names the table and, where appropriate, the column(s) that hold the valid list of values for the foreign key. A referenced_column must already be named in a NOT DEFERRABLE PRIMARY KEY or NOT DEFERRABLE UNIQUE KEY statement. The table types must also match; for example, if one is a local temporary table, both must be local temporary tables.
MATCH {FULL | PARTIAL | SIMPLE}

Defines the degree of matching required between the local and referenced columns in foreign-key constraints when NULLs are present:

FULL
Declares that a match is acceptable when: 1) none of the referencing columns are NULL and match all of the values of the referenced column, or 2) all of the referencing columns are NULL. In general, you should either use MATCH FULL or ensure that all columns involved have NOT NULL constraints.
PARTIAL
Declares that a match is acceptable when at least one of the referenced columns is NULL and the others match the corresponding referenced columns.
SIMPLE
Declares that a match is acceptable when any of the values of the referencing column is NULL or a match. This is the default.
ON UPDATE
Specifies that, when an UPDATE operation affects one or more referenced columns of the primary or unique key on the referenced table, a corresponding action should be taken to ensure that the foreign key does not lose data integrity. ON UPDATE may be declared independently of or together with the ON DELETE clause. When omitted, the default for the ANSI standard is ON UPDATE NO ACTION.
ON DELETE
Specifies that, when a DELETE operation affects one or more referenced columns of the primary or unique key on the referenced table, a corresponding action should be taken to ensure that the foreign key does not lose data integrity. ON DELETE may be declared independently of or together with the ON UPDATE clause. When omitted, the default for the ANSI standard is ON DELETE NO ACTION.
NO ACTION | CASCADE | RESTRICT | SET NULL | SET DEFAULT
Defines the action the database takes to maintain the data integrity of the foreign key when a referenced primary or unique key constraint value is changed or deleted:
NO ACTION
Tells the database to do nothing when a primary key or unique key value referenced by a foreign key is changed or deleted.
CASCADE
Tells the database to perform the same action (i.e., DELETE or UPDATE) on the matching foreign key when a primary key or unique key value is changed or deleted.
RESTRICT
Tells the database to prevent changes to the primary key or unique key value referenced by the foreign key.
SET NULL
Tells the database to set the value in the foreign key to NULL when a primary key or unique key value is changed or deleted.
SET DEFAULT
Tells the database to set the value in the foreign key to the default (using default values you specify for each column) when a primary key or unique key value is changed or deleted.

As with the code example for primary keys, you can adapt this generic syntax to both column-level and table-level foreign key constraints. Note that column-level and table-level constraints perform their function in exactly the same way; they are merely defined at different levels of the CREATE TABLE command. In the following example, we create a single-column foreign key on the salesrep column referencing the empid column of the employee table. We create the foreign key two different ways, the first time at the column level and the second time at the table level:

        -- Creating a column-level constraint
        CREATE TABLE distributors
         (dist_id CHAR(4) PRIMARY KEY,
         dist_name VARCHAR(40),
         dist_address1 VARCHAR(40),
         dist_address2 VARCHAR(40),
         city VARCHAR(20),
         state CHAR(2) ,
         zip CHAR(5) ,
         phone CHAR(12) ,sales_rep INT NOT
        
NULL REFERENCES employee(empid));
        
        -- Creating a table-level constraint
        CREATE TABLE distributors
         (dist_id CHAR(4) NOT NULL,
         dist_name VARCHAR(40),
         dist_address1 VARCHAR(40),
         dist_address2 VARCHAR(40),
         city VARCHAR(20),
         state CHAR(2) ,
         zip CHAR(5) ,
         phone CHAR(12) ,
         sales_rep INT ,
        CONSTRAINT pk_dist_id PRIMARY KEY (dist_id),CONSTRAINT fk_empidFOREIGN KEY (sales_rep)REFERENCES employee(empid));
        

UNIQUE Constraints

A UNIQUE constraint, sometimes called a candidate key, declares that the values in one column, or the combination of values in more than one column, must be unique. Rules concerning unique constraints include:

  • Columns in a unique key cannot have data types of BLOB, CLOB, NCLOB, or ARRAY.
  • The column or columns in a unique key may not be identical to those in any other unique keys, or to any columns in the primary key of the table.
  • A single NULL value, if the unique key allows NULL values, is allowed.
  • ANSI/ISO SQL allows you to substitute the column list shown in the general syntax diagram for constraints with the keyword (VALUE). UNIQUE (VALUE) indicates that all columns in the table are part of the unique key. The VALUE keyword also disallows any other unique or primary keys on the table.

In the following example, we limit the number of distributors we do business with to only one distributor per zip code. We also allow one (and only one) “catch-all” distributor with a NULL zip code. This functionality can be implemented easily using a UNIQUE constraint, either at the column or the table level:

        -- Creating a column-level constraint
        CREATE TABLE distributors
         (dist_id CHAR(4) PRIMARY KEY,
         dist_name VARCHAR(40),
         dist_address1 VARCHAR(40),
         dist_address2 VARCHAR(40),
         city VARCHAR(20),
         state CHAR(2) ,zip CHAR(5) UNIQUE,
         phone CHAR(12) ,
         sales_rep INT NOT NULL
         REFERENCES employee(empid));
        -- Creating a table-level constraint
        CREATE TABLE distributors
         (dist_id CHAR(4) NOT NULL,
         dist_name VARCHAR(40),
         dist_address1 VARCHAR(40),
         dist_address2 VARCHAR(40),
         city VARCHAR(20),
         state CHAR(2) ,
         zip CHAR(5) ,
         phone CHAR(12) ,
         sales_rep INT ,
        CONSTRAINT pk_dist_id PRIMARY KEY (dist_id),
        CONSTRAINT fk_emp_id FOREIGN KEY (sales_rep)
         REFERENCES employee(empid),CONSTRAINT unq_zip UNIQUE (zip));
      

CHECK Constraints

CHECK constraints allow you to perform comparison operations to ensure that values match specific conditions that you set out. The syntax for a check constraint is very similar to the general syntax for constraints:

        [CONSTRAINT] [constraint_name] CHECK (search_conditions)
        [constraint_deferment] [deferment_timing]
      

Most of the elements of the check constraint were introduced earlier in this section. The following element is unique to this constraint:

search_conditions

Specifies one or more search conditions that constrain the values inserted into the column or table, using one or more expressions and a predicate. Multiple search conditions may be applied to a column in a single check constraint using the AND and OR operators (think of a WHERE clause).

A check constraint is considered matched when the search conditions evaluate to TRUE or UNKNOWN. Check constraints are limited to Boolean operations (e.g., =, >=, <=, or <>), though they may include any ANSI/ISO SQL predicate, such as IN or LIKE. Check constraints may be appended to one another (when checking a single column) using the AND and OR operators. Here are some other rules about check constraints:

  • A column or table may have one or more check constraints.
  • A search condition cannot contain aggregate functions, except in a subquery.
  • A search condition cannot use nondeterministic functions or subqueries.
  • A check constraint can only reference like objects. That is, if a check constraint is declared on a global temporary table, it cannot then reference a permanent table.
  • A search condition cannot reference these ANSI functions: CURRENT_USER, SESSION_USER, SYSTEM_USER, USER, CURRENT_PATH, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP.

The following example adds a check constraint to the dist_id and zip columns. (This example uses generic code run on SQL Server.) The zip code must fall into the normal ranges for postal zip codes, while the dist_id values are allowed to contain either four alphabetic characters or two alphabetic and two numeric characters:

        -- Creating column-level CHECK constraints
        CREATE TABLE distributors(dist_id CHAR(4)CONSTRAINT pk_dist_id PRIMARY KEYCONSTRAINT ck_dist_id CHECK(dist_id LIKE '[A-Z][A-Z][A-Z][A-Z]' ORdist_id LIKE '[A-Z][A-Z][0-9][0-9]'),
         dist_name VARCHAR(40),
         dist_address1 VARCHAR(40),
         dist_address2 VARCHAR(40),
         city VARCHAR(20),
         state CHAR(2)
         CONSTRAINT def_st DEFAULT ("CA"),zip CHAR(5)CONSTRAINT unq_dist_zip UNIQUE
        
 CONSTRAINT ck_dist_zip CHECK(zip LIKE '[0-9][0-9][0-9][0-9][0-9]'),
        
         phone CHAR(12),
         sales_rep INT
         NOT NULL DEFAULT USER REFERENCES employee(emp_id))
      
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.150.59