Chapter 10

Numeric Data in SQL

Abstract

SQL is not a computational language; the arithmetic capability of the SQL is weaker than that of almost any other language you have ever used. But there are some tricks that you need to know working with numbers in SQL and when passing them to a host program.

Keywords

NULL

Exact numeric data types

INTEGER

BIGINT

SMALLINT

NUMERIC(p, s)

DECIMAL (p, s)

BIT

Byte

BOOLEAN

Approximate numeric FLOAT(p)

REAL

DOUBLE PRECISION

IEEE floating point

Type conversions

INCITS/H2 Database Standards Committee

NULLIF()

COALESCE()

Mathematical functions

MOD()

IP addresses

Sql is not a computational language; the arithmetic capability of the SQL is weaker than that of almost any other language you have ever used. But there are some tricks that you need to know working with numbers in SQL and when passing them to a host program. Much of the arithmetic and the functions are implementations defined, so you should experiment with your particular product and make notes on the default precision, and tools in the math library of your database.

This section deals with the arithmetic that you would use across a row instead of down a column; they are not quite the same.

The SQL Standard has a very wide range of numeric types. The idea is that any host language can find an SQL numeric type that matches one of its own. Remember that SQL is meant to be used with a host language and not by itself.

Numbers in SQL are classified as either exact or approximate. An exact numeric value has a precision, p, and a scale, s. The precision is a positive integer that determines the number of significant digits in a particular radix. The standard says the radix can be either binary or decimal, so you need to know what your implementation does. The scale is a nonnegative integer that tells you how many radix places the number has.

Today, there are not many base-ten platforms, so you almost certainly have a binary machine. However, a number can have one of many binary representations — twos-complement, ones-complement, high end or low end and various word sizes. The proper mental model of numbers in SQL is not to worry about thebits and byteslevel of the physical representation, but to think in abstract terms.

10.1 Exact Numeric Data Types

The data types INTEGER, BIGINT, SMALLINT, NUMERIC(p,s), and DECIMAL(p,s) are exact numeric types. An integer has a scale of zero but the syntax simply uses the word INTEGER or the abbreviation INT but if you use it you will look like a C family programmer. SMALLINT has a scale of zero, but the range of values it can hold is less than or equal to the range that INTEGER can hold in the implementation. Likewise, BIGINT has a scale of zero, but the range of values it can hold is greater than or equal to the range that INTEGER can hold in the implementation.

DECIMAL(p,s) can also be written DEC(p,s) . For example, DECIMAL(8,2) could be used to hold the number 123456.78, which has eight significant digits and two decimal places.

The difference between NUMERIC(p,s) and DECIMAL(p,s) is subtle. NUMERIC(p,s) specifies the exact precision and scale to be used. DECIMAL(p,s) specifies the exact scale, but the precision is implementation-defined to be equal to or greater than the specified value. That means DECIMAL(p,s) can have some room for rounding and NUMERIC(p,s) does not. Mainframe COBOL programmers can think of NUMERIC(p,s) as a PICTURE numeric type, whereas DECIMAL(p,s) is like a BCD. The use of BCD is not common today, but was popular on older mainframe business computers. I recommend using DECIMAL(p,s) because it might enjoy some extra precision.

10.1.1 Bit, Byte, and Boolean Data Types

The ANSI/ISO Standards provide for BOOLEAN, BINARY and BINARY VARYING data types and operations. Machine level things like a bit or byte data type have no place in SQL and are almost never used. SQL has a three-valued logic and it does not naturally accommodate Boolean algebra. The value TRUE is greater than the value FALSE, and any comparison involving NULL or an UNKNOWN truth value will return an UNKNOWN result. But what does

SELECT
CASE WHEN ((x = 1) >= (y = 42))
THEN 'TRUE' ELSE 'FALSE' END
FROM Foobar;

mean conceptually? And are not there better ways to express the intent?

SQL is a high-level language; it is abstract and defined without regard to physical implementation. This basic principle of data modeling is called data abstraction. Bits and bytes are the lowest units of hardware-specific, physical implementation you can get. Are you on a high-end or low-end machine? Does the machine have 8, 16, 32, 64, or 128 bit words? Twos complement or ones complement math? Hey, the SQL Standards allow decimal machines, so bits do not have to exist at all!

What about NULLs in this data type? To be an SQL data type, you should have NULLs, so what is a NULL bit? By definition a bit, is in one of two states, on or off and has no NULL. If your vendor adds NULLs to bit, how are the bit-wise operations defined? Oh what a tangled web we weave when first we mix logical and physical models.

What do the host languages do? Did you know that + 1, + 0, − 0 and − 1 are all used for BOOLEANs, but not consistently? In C#, Boolean values are 0/1 for FALSE/TRUE, while VB.NET has Boolean values of 0/− 1 for FALSE/TRUE and they are proprietary languages from the same vendor. That means all the host languages—present, future and not-yet-defined—can be different.

For standard programming languages C and COBOL, BOOLEAN values are mapped to integer variables in the host language. For standard programming languages Ada, FORTRAN, Pascal, and PL/I, BOOLEAN variables are directly supported. All data types in SQL have to be NULL-able, so the SQL Standard requires that a NULL Boolean is UNKNOWN; unfortunately, this makes the behavior of the data type inconsistent. The rule for NULLs has always been that they propagate. Consider the expressions:

(1 = 1) OR NULL yields NULL which is UNKNOWN
(1 = 1) OR UNKNOWN yields TRUE
(1 = 1) AND UNKNOWN yields UNKNOWN
(1 = 1) AND NULL yields NULL which is UNKNOWN

Using assembly language style bit flags has its own problems.

There are usually two situations for using bits in practice. Either the bits are individual attributes or they are used as a vector to represent a single attribute. In the case of a single attribute, the encoding is limited to two values, which do not port to host languages or other SQL products, cannot be easily understood by an end user, and which cannot be expanded.

In the second case what some Newbies, who are still thinking in terms of second and third generation programming languages or even punch cards, do is build a vector for a series of “yes/no” status codes, failing to see the status vector as a single attribute. Did you ever play the children’s game “20 Questions” when you were young?

Imagine you have six components for a medical patient, so you allocate bits in your second-generation model of the world.

1. Patient is male

2. Patient is pregnant

3. Patient is under-weight

4. Patient is over-weight

5. Patient has a fever

6. Patient has high blood pressure

You have 64 possible bit patterns, but only some of the 64 bit patterns are valid (i.e., you cannot be pregnant and be a man; you cannot be both under-weight and over-weight). For your data integrity, you can:

(1) Ignore the problem. This is actually what most newbies do. I have spent three decades cleaning up bad SQL and I see it all the time.

(2) Write elaborate CHECK() constraints with user defined functions or proprietary bit level library functions that cannot port and that run like cold glue.

Now we add a seventh condition to the bit vector—which end does it go on? Why? How did you get it in the right place on all the possible hardware that it will ever use? Did all the code that references a bit in a word by its position do it right after the change?

You need to sit down and think about how to design an encoding of the data that is high level, general enough to expand, abstract and portable. For example, is that loan approval a hierarchical code? or a concatenation code? or a vector code? Did you provide codes for unknown, missing and N/A values? It is not easy to design such things!

BINARY and BINARY VARYING data types were meant provide a standard term for storing data in various formats that are not part of the SQL data types, such as images, video, audio and so forth.

Once upon a time, long, long time ago, programming languages were tied to the hardware. Obviously, assembly and machine languages were designed for one particular family of computers. But even the higher level languages still clung to the hardware. FORTRAN and COBOL standards were defined on the assumption that files were sequential, main storage was contiguous and all machine were binary.

The hardware affected the design of the languages. The hardware used a lot of bit flags and bit masks for control inside the machine. Flags were also used in the file system to mark deleted records and other things. Storage was expensive and very slow by today’s standards. Bit flags were small and relatively fast to write. Since this was the style of coding we saw, we tended to mimic that programming style in higher level code. The bad news is that even as our programming languages became more abstract, the programmer’s mind set did not.

When we got to SQL, we were in a very different game. The BIT data type in T-SQL and other SQL products was originally a bit in the way that computer people think of it; one or zero indicating an open or closed circuit. Code was written on the assumption that if a BIT column was not zero, it was one and if it was not one, it was zero.

Then BIT became a numeric data type. All data types in SQL are NULL-able. This change caught a lot of old programmers off-guard. Unexpected NULLs showed up. In spite of being a numeric data type, you cannot do any math on it.

This is even worse with bit masks. They are defined by hardware-specific, physical implementations. Are you on a high-end or low-end machine? Does the machine have 8, 16, 32, 64, or 128 bit words? Twos complement or ones complement math? Can a single bit in the mask be NULL-able? Hey, the SQL Standards say nothing about hardware, so bits do not have to exist at all! The most obvious problems with using an INTEGER (or other exact numeric data type) as a bit mask are:

1. The data is unreadable. Can you easily figure out what each bit means by looking at without a guide? Looking at "WHERE auction_status & 42 <> 0" is not very clear to a maintenance programmer whose languages do not use & for bit-wise conjunction. This is why we design encoding schemes like Dewey Decimal codes for libraries; they are easy to read and to use.

2. Constraints are a bitch to write. The two choices are to use a lot of proprietary bit-wise and/or operators that are hard to optimize, or to a set of INTEGER values with a [NOT] IN() predicate.

3. You are limited to two values per column. That is very restrictive; even the ISO sex code cannot fit into such a column. What is fun is using two or more fields for more values. You start off with {00, 01, 10, 11} in contiguous positions. But when need more values, the next bit is not contiguous and the predicates are really horrible.

4. Think about a “is_completed_flg” bit flag on an auction. Did it complete because a bid was accepted? Because the bid was withdrawn? Because the reserve price was not met? Because the item was withdrawn? Because it expired?

5. There is no temporal element to the bit mask (or to single bit flags). For example, a flag “is_legal_adult_flg” does not tell you if the person is 18 or 80. Nor does it tell you exactly what kind of legal adult is involved. In Texas, you can be a stripper or porn star at 18 years of age, but cannot buy or serve a drink until you are 21. A DATE for the birth date (just 3 bytes) would hold complete fact and let us compute what we need to know; it would always be correct, too. How do you know a bit flag is still current?

6. You will find out that using the flags will tend to split the status of an entity over multiple tables. Let me give an actual example in one of my consulting jobs. The client conducts on-line auctions. A request for bids is posted, then it moves thru a series of steps over time until it expires, is withdrawn or finds a winning bid. The auction can be in about a dozen states, but the state changes have an ordering. A bid cannot be rejected until it is made, and so forth.

7. In the old system, it was necessary to go to the Invoices table, and see if a payment had been made on a bid. The date of the payment of the invoice sets one of many bit flags that were supposed to tell us the status of auctions. Now go over to shipments and see that the matching shipment left after the invoice was paid. You get the idea; tracking an auction involved between eight to ten tables. The bad news was that people who worked with one of the process steps did not know or much care about the big picture. Data integrity suffered. Trying to track an auction history was hard enough, but when a flag had not be set correctly, it became a nightmare.

8. Bit flags invite redundancy. In the system I just mentioned, we had “is_active_flg” and “is_completed_flg” in the same table. A completed auction is not active and vice verse. It is the same fact in two flags. Human psychology (and the English language) prefers to hear an affirmative wording (remember the old song “Yes, we have no bananas today!”?). All of these bit flags, and sequence validation are being replaced by two sets of state transition tables, one for bids and one for shipments. For details on state transition constraints. The history of each auction is now in one place and has to follow business rules.

9. By the time you disassemble a bit mask column, and throw out the fields you did not need performance is not going to be improved over simpler data types. Remember we have 64 bit machinery today and it does comparisons quite fast.

10. Grouping and ordering on the individual fields is a real pain. Try it.

11. You have to index the whole column, so unless you luck up and have them in the right order, you are stuck with table scans.

12. Since a bit mask is not in First Normal Form (1NF), you have all the anomalies we wanted to avoid in RDBMS.

SQL is a “predicate language” and not a “flag language” like assembler. We use predicates and declarative code to discover the current state of the database in queries and to maintain data integrity in DDL. Try finding personnel that are legal adults. With a flag, I look use something like "Personnel.legal_adult_flg = CAST(1 AS BIT)" as my test. If the legal age changes to 21 for our purposes, as it did when the US raised the drinking age, then the search condition code change is easy; I have two slightly different predicates for two kinds of legal age. Some insurance benefits can apply to children up to age 25, which gives us three kinds of legal age. The assembly language programer is busy trying to add more flags to his Personnel table. Then he has to update them with my predicates before every execution of his query. This is not saving you anything.

10.2 Approximate Numeric Data Types

An approximate numeric value consists of a mantissa and an exponent. The mantissa is a signed numeric value; the exponent is a signed integer that specifies the magnitude of the mantissa. An approximate numeric value has a precision. The precision is a positive integer that specifies the number of significant binary digits in the mantissa. The value of an approximate numeric value is the mantissa multiplied by 10 to the exponent. FLOAT(p), REAL, and DOUBLE PRECISION are the approximate numeric types. There is a subtle difference between FLOAT(p), which has a binary precision equal to or greater than the value given, and REAL, which has an implementation-defined precision.

10.2.1 Float Versus Real Versus Double Precision

In the real world REAL and DOUBLE PRECISION are the IEEE Standard 754 for floating point numbers; FLOAT(p) is almost never used. IEEE math functions are built into processor chips so they will run faster than a software implementation. IEEE Standard 754 is binary and uses 32 bits for single precision and 64 bits for double precision, which is just right for personal computers and most Unix and Linux platforms.

The range for single precision numbers is approximately ± 10− 44.85 to 1038.53 and for double precision, approximately ± 10− 323.3 to 10308.3, respectively. However, there are some special values in the IEEE standard.

Zero cannot be directly represented in this format, so it is modeled as a special value denoted with an exponent field of zero and a fraction field of zero. The sign field can make this either − 0 or + 0, which are distinct values that compare as equal.

If the exponent is all zeroes, but the fraction is non-zero (else it would be interpreted as zero), then the value is a denormalized number. This denormalization has nothing to do with RDBMS and the normalization of tables. Because of the distribution of binary representations near zero, is not uniform, a “bias” has to be subtracted from these numbers to avoid gaps. The basic rules are:

1. The sign bit is 0 for positive, 1 for negative.

2. The exponent’s base is two.

3. The exponent field contains 127 plus the true exponent for single-precision, or 1023 plus the true exponent for double precision.

4. The first bit of the mantissa is typically assumed to be 1.f, where f is the field of fraction bits.

Most SQL programmers will not see problems in computation because the IEEE standards make the adjustments in software or hardware.

10.2.2 IEEE Floating Point Extensions

The two values “ + infinity” and “−infinity” are denoted with an exponent of all ones and a fraction of all zeroes. The sign bit distinguishes between negative infinity and positive infinity. Being able to denote infinity as a specific value is useful because it allows operations to continue past overflow situations. Operations with infinite values are well defined in IEEE floating point.

The value NaN (“Not a Number”) is used to represent a bit configuration that does not represent number. NaN’s are represented by a bit pattern with an exponent of all ones and a nonzero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signaling NaN).

A QNaN is a NaN with the most significant fraction bit set. QNaN’s propagate freely through most arithmetic operations. These values pop out of an operation when the result is not mathematically defined, like division by zero.

An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaN’s can be handy to assign to uninitialized variables to trap premature usage. Semantically, QNaN’s denote indeterminate operations, while SNaN’s denote invalid operations.

SQL has not accepted the IEEE model for mathematics for several reasons. Much of the SQL Standard allows implementation defined rounding, truncation and precision so as to avoid limiting the language to particular hardware platforms. If the IEEE rules for math were allowed in SQL, then we need type conversion rules for infinite and a way to represent an infinite exact numeric value after the conversion. People have enough trouble with NULLs, so let’s not go there.

10.3 Numeric Type Conversions

There are a few surprises in converting from one numeric type to another. The SQL Standard left it up to the implementation to answer a lot of basic questions, so the programmer has to know his SQL package.

10.3.1 Rounding and Truncating

When an exact or approximate numeric value is assigned to an exact numeric column, it may not fit. SQL says that the database engine will use an approximation that preserves leading significant digits of the original number after rounding or truncating. The choice of whether to truncate or round is implementation-defined, however. This can lead to some surprises when you have to shift data among SQL implementations, or storage values from a host language program into an SQL table. It is probably a good idea to create the columns with more decimal places than you think you need.

Truncation is defined as truncation toward zero; this means that 1.5 would truncate to 1, and − 1.5 would truncate to − 1. This is not true for all programming languages; everyone agrees on truncation toward zero for the positive numbers, but you will find that negative numbers may truncate away from zero (i.e., − 1.5 would truncate to − 2).

SQL is also indecisive about rounding, leaving the implementation free to determine its method. There are two major types of rounding in programming.

The scientific method looks at the digit to be removed. If this digit is 0, 1, 2, 3, or 4, you drop it and leave the higher-order digit to its left unchanged. If the digit is 5, 6, 7, 8, or 9, you drop it and increment the digit to its left. This method works with a small set of numbers and was popular with FORTRAN programmers because it is what engineers use.

The commercial methods look at the digit to be removed. If this digit is 0, 1, 2, 3, or 4, you drop it and leave the digit to its left unchanged. If the digit is 6, 7, 8, or 9, you drop it and increment the digit to its left. However, when the digit is 5, you want to have a rule that will round up about half the time.

One rule is to look at the digit to the left: If it is odd, then leave it unchanged; if it is even, increment it. The “Round half to even” is called “banker’s rounding” (http://en.wikipedia.org/wiki/Rounding#Round_half_to_even). There are other versions of the decision rule, but they all try to make the rounding error as small as possible. This method works with a large set of numbers and is popular with bankers because it reduces the total rounding error in the system. This rule keeps commercial rounding symmetric. The usual ROUND() functions uses the scientific method, but you can use the MOD() function to implement the commercial method. Assume a simple list of two decimal place numbers we wish to round to one decimal place:

CREATE TABLE Accounts (amount DECIMAL(5,2) NOT NULL);
SELECT amount,
ROUND(amount, 1) AS scientific,
CASE WHEN MOD((100 * amount), 10) <> 5
THEN ROUND(amount, 1)
ELSE CASE WHEN MOD (FLOOR(MOD((10 * amount), 10)), 2) = 0
THEN TRUNC(amount, 1) -- even, down
ELSE ROUND(amount, 1) -- odd, up
END
END AS commercial
FROM Accounts;

In commercial transactions, you carry money amounts to four or more decimal places, but round them to two decimal places for display. This is a GAAP (Generally Accepted Accounting Practice) in the United States for US Dollars and laws in the European Union for working with Euros. Check with an accountant to be sure.

Statistical and scientific software has special routines to correct floating point rounding problems, but most SQL databases do not. Floating math is rare in commercial applications and most commercial computers do not have floating point processors.

10.3.2 CAST() Function

Standard SQL defined the general CAST(< cast operand > AS < data type >) function for all data type conversions. The < cast operand > to be either a < column name >, a < value expression >, or a NULL.

For numeric-to-numeric conversion, you can do anything you wish, but you have to watch for the rounding errors. The comparison predicates can hide automatic type conversions, so be careful. Some castings are not legal in particular cases (i.e. not all strings cast to numbers or temporal values).

10.4 Four Function Arithmetic

SQL was originally weaker than a pocket calculator. Today, the Standards include most of the basic math functions. The dyadic arithmetic operators +, −, *, and / stand for addition, subtraction, multiplication, and division, respectively. The multiplication and division operators are of equal precedence and are performed before the dyadic plus and minus operators.

In algebra and in some programming languages, the precedence of arithmetic operators is more strict, so those programmers have to use parentheses to force an order of exception. They use the “My Dear Aunt Sally” rule; that is, multiplication is done before division, which is done before addition, which is done before subtraction. This can lead to subtle errors.

For example, consider (largenum + largenum − largenum), where largenum is the maximum value that can be represented in its numeric data type. If you group the expression from left to right, you get ((largenum + largenum) − largenum) = overflow error! However, if you group the expression from right to left, you get (largenum + (largenum − largenum)) = largenum.

Because of these differences, an expression that worked one way in the host language may get different results in SQL and vice versa. SQL could reorder the expressions to optimize them. The best way to be safe is always to make extensive use of parentheses in complicated expressions, whether they are in the host language or in your SQL.

The monadic plus and minus signs are allowed and you can string as many of them in front of a numeric value of variables as you like. The bad news about this decision is that SQL also uses Ada-style comments, which put the text of a comment line between a double dash and a new line-character. This means that the parser has to figure out whether “--” is two minus signs or the start of a comment. Standard SQL also support C-style comment brackets (i.e., /* comment text */). Such brackets can be used in international data transmission standards which do not recognize a new line in a transmission so the double-dash convention will not work.

If both operands are exact numeric, the data type of the result is exact numeric, as you would expect. Likewise, an approximate numeric in a calculation will cast the results to approximate numeric. The kicker is in how the results are assigned in precision and scale.

Let S1 and S2 be the scale of the first and second operands, respectively. The precision of the result of addition and subtraction is implementation-defined, and the scale is the maximum of S1 and S2. The precision of the result of multiplication is implementation-defined, and the scale is (S1 + S2). The precision and scale of the result of division are implementation-defined, and so are some decisions about rounding or truncating results.

The INCITS/H2 Database Standards Committee debated about requiring precision and scales in the standard in the early days of SQL and finally gave up. This means I can start losing high-order digits, especially with a division operation, where it is perfectly legal to make all results single-digit integers.

Nobody does anything that stupid in practice. In the real world, some vendors allow you to adjust the number of decimal places as a system parameter, some default to a known number of decimal places, and some display as many decimal places as they can so that you can round off to what you want. You will simply have to learn what your implementation does by experimenting with it.

10.4.1 Arithmetic and NULLs

NULLs are probably one of the most formidable database concepts for the beginner. This book has a detailed study of how NULLs work in SQL, but this section is concerned with how they act in arithmetic expressions.

The NULL in SQL is only one way of handling missing values. The usual description of NULLs is that they represent currently unknown values that might be replaced later with real values when we know something. Missing values actually cover a lot of territory. The Interim Report 75-02-08 to the ANSI X3 (SPARC Study Group 1975) showed 14 different kinds of incomplete data that could appear as the results of operations or as attribute values. They included such things as arithmetic underflow and overflow, division by zero, string truncation, raising zero to the zero-th power, and other computational errors, as well as missing or unknown values.

The NULL is a global creature, not belonging to any particular data type, but able to replace any of their values. This makes arithmetic a bit easier to define. You have to specifically forbid NULLs in a column by declaring the column with a NOT NULL constraint. But in Standard SQL you can use the CAST function to declare a specific data type for a NULL, such as CAST (NULL AS INTEGER). One reason for this convention is completeness; another is to let you pass practicable implementation information about how to create a column to the database engine.

The basic rule for math with NULLs is that they propagate. An arithmetic operation with a NULL will return a NULL. That makes sense; if a NULL is a missing value, then you cannot determine the results of a calculation with it. However, the expression (NULL / 0) looks strange to people. The first thought is that a division by zero should return an error; if NULL is a missing value to be determined later, there is no value to which it can resolve and make that expression valid. However, SQL propagates the NULL, while a non-NULL divided by zero will cause a runtime error. NULL has no expectation of ever resolving to a value; it is “nonvalue”, not just a place marker.

10.5 Converting Values to and from NULL

Since host languages do not support NULLs, the programmer can elect either to replace them with another value that is expressible in the host language or to use INDICATOR variables to signal the host program to take special actions for them.

An indicator parameter is an integer host parameter that is specified immediately following another host parameter. When the first host parameter gets a NULL, the indicator is set as a negative value. Indicators also show positive numbers to show string data truncation occurred during a transfer between a host program and SQL. A zero means there were no problems with the conversion.

10.5.1 NULLIF() Function

Standard SQL specifies two functions, NULLIF() and the related COALESCE(), that can be used to replace expressions with NULL and vice versa. They are part of the CASE expression family. The NULLIF(V1, V2) function has two parameters. It is equivalent to the following CASE expression:

NULLIF(V1, V2) := CASE
WHEN (V1 = V2)
THEN NULL
ELSE V1 END;

That is, when the first parameter is equal to the second, the function returns a NULL; otherwise, it returns the first parameter’s value. The properties of this function allow you to use it for many purposes. The important properties are these:

(1) NULLIF(x, x) will return NULL for all values of x. This includes NULL, since (NULL = NULL) is UNKNOWN, not TRUE.

(2) NULLIF(0, (x − x)) will convert all non-NULLs of a numeric x into NULL. But it will convert a NULL into zero, since (NULL − NULL) is NULL and the equality test will fail.

(3) NULLIF(1, (x − x + 1)) will convert all non-NULLs of x into NULL. But it will convert a NULL into a 1. This can be generalized for all numeric data types and values.

10.5.2 COALESCE() Function

The COALESCE(< value expression >, …, < value expression >) function scans the list of < value expression > s from left to right, determines the highest data type in the list and returns the first non-NULL in the list, casting it to the highest data type. If all the < value expression > s are NULL, the result is NULL.

The most common use of this function in math expressions is in a SELECT list where there are columns that have to be added, but one can be a NULL. For example, to create a report of the total pay for each employee, you might write this query:

SELECT emp_nbr, emp_name, (salary_amt + commission_amt) AS pay_tot
FROM Personnel;

But salesmen may work on commission_amt only or on a mix of salary_amt and commission_amt. The office staff is on salary_amt only. This means an employee could have NULLs in his salary_amt or commission_amt column, which would propagate in the addition and produce a NULL result. A better solution would be

SELECT emp_nbr, emp_name
COALESCE(salary_amt, 0.00) + COALESCE(commission_amt, 0.00))
AS paycheck_amt
FROM Personnel;

As an example of the use of COALESCE(), create a table of payments made for each month of a single year. (Yes, this could be done with a column for the months, but bear with me.)

CREATE TABLE Payments
(cust_nbr INTEGER NOT NULL,
jan DECIMAL(8,2),
feb DECIMAL(8,2),
mar DECIMAL(8,2),
apr DECIMAL(8,2),
may DECIMAL(8,2),
jun DECIMAL(8,2),
jul DECIMAL(8,2),
aug DECIMAL(8,2),
sep DECIMAL(8,2),
oct DECIMAL(8,2),
nov DECIMAL(8,2),
"dec" DECIMAL(8,2), -- DEC is a reserved word
PRIMARY KEY cust_nbr);

The problem is to write a query that returns the customer and the amount of the last payment he made. Unpaid months are shown with a NULL in them. We could use a COALESCE function like this:

SELECT cust_nbr,
COALESCE ("dec", nov, oct, sep,
aug, jul, jun, may, apr, mar, feb, jan)
FROM Payments;

Of course this query is a bit incomplete, since it does not tell you in what month this last payment was made. This can be done with the rather ugly-looking expression that will turn a month’s non-NULL payment into a character string with the name of the month. The general case for a column called “mon”, which holds the number of a month within the year, is NULLIF (COALESCE (NULLIF (0, mon-mon), ‘Month’), 0) where ‘Month’ is replaced by the string for the actual name of the particular month. A list of these statements in month order in a COALESCE will give us the name of the last month with a payment. The way this expression works is worth working out in detail.

Case 1: mon is a numeric value

NULLIF(COALESCE(NULLIF(0, mon - mon), 'Month'), 0)
NULLIF(COALESCE(NULLIF(0, 0), 'Month'), 0)
NULLIF(COALESCE(NULL, 'Month'), 0)
NULLIF('Month', 0)
('Month')

Case 2: mon is NULL

NULLIF(COALESCE(NULLIF(0, mon-mon), 'Month'), 0)
NULLIF(COALESCE(NULLIF(0, NULL-NULL), 'Month'), 0)
NULLIF(COALESCE(NULLIF(0, NULL), 'Month'), 0)
NULLIF(COALESCE(0, 'Month'), 0)
NULLIF(0, 0)
(NULL)

You can do a lot of work by nesting SQL functions. LISP programmers are used to thinking this way, but most procedural programmers are not. It just takes a little practice and time.

10.6 Mathematical Functions

The SQL:2003 Standard extended the original four-function math to include a small library of functions. Most of them have been in actual products for decades. SQL is not a computational language, so it should not have a math function library like, say, FORTRAN. Nor a string function library like ICON.

10.6.1 Number Theory Operators

ABS(n) = Absolute value function. Returns the absolute value of n. If (n) is NULL, then the result is NULL.

SIGN(n) = Signum function. Returns − 1 if n is negative, 0 if n is zero and + 1 if n is positive. If (n) is NULL, then the result is NULL. This function is the “signum” in mathematical terminology.

MOD(n, m) = modulo or remainder function. If either n or m is NULL, then the result is NULL. If m is zero, then we get a division by zero exception. Otherwise, the result is the unique non-negative exact numeric value r with scale zero such that.

(1) r has the same sign as n.

(2) the absolute value of r is less than the absolute value of m.

(3) n = m * k + r for some exact numeric value k with scale zero.

This is tricky when the values of n and m are not cardinals (i.e., positive, nonzero integers). Experiment and find out how your package handles negative numbers and decimal places. In particular, many other procedural languages have slightly different definitions. If you are foolish enough to use “features” that allow other programming languages to be embedded in the DDL, then you cannot have consistent data. This was a major issue for the Pascal at one time, among others.

In 1996 September, Len Gallagher of NIST proposed an amendment for the MOD function in the SQL3 working papers. Originally, the working draft defined MOD(n, m) only for positive values of both m and n, and leaves the result to be implementation-dependent when either of m or n is negative.

Negative values of n have no required mathematical meaning and that many implementations of MOD either do not define it at all, or give some result that is the easiest to calculate on a given hardware platform.

However, negative values for (m) do have a very nice mathematical interpretation that we wanted to see preserved in the SQL definition of MOD(). Len propose the following:

(1) If n is positive, then the result is the unique non_negative exact numeric quantity r with scale zero such that r is less than m and n = (m * k) + r for some exact numeric quantity k with scale zero.

(2) Otherwise, the result is an implementation-defined exact numeric quantity r with scale zero which satisfies the requirements that r is strictly between m and (− m), and that n = (m * k) + r for some exact numeric quantity k with scale zero, and a completion condition is raised: warning—implementation-defined result.

This definition guarantees that the MOD() function, for a given positive value of n, will be a homomorphism under addition from the mathematical group of all integers, under integer addition, to the modular group of integers {0, 1…, m − 1} under modular addition. This mapping then preserves the following group properties:

(1) The additive identity is preserved: MOD(0, m) = 0

(2) Additive inverse is preserved in the modular group defined by MOD(− MOD(n, m), m) = m − MOD(n, m): MOD(− n, m) = − MOD(n, m)

(3) The addition property is preserved where “{{ circled plus sign }}” is modular addition defined by MOD((MOD(m, m) + MOD(n, m)), m). MOD((m + n), m) = MOD(m, m) {{ circled plus sign }} MOD(n, m)

(4) Subtraction is preserve under modular subtraction, which is defined as MOD((MOD(m, m) {{ circled minus sign }} MOD(n, m)), m)

MOD(m − n, m) = MOD(m, m) {{ circled minus sign }} MOD(n, m)

From this definition, we would get the following:

MOD(12, 5) = 2

MOD(− 12, 5) = 3

There are some applications where the “best” result to MOD(− 12, 5) might “−2” or “−3” rather than “3”; and that is probably why various implementations of the MOD function differ. But the advantages of being able to rely on the above mathematical properties outweigh any other considerations. If a user knows what the SQL result will be, then it is easy to modify the expressions of a particular application to get the desired application result. Here is a chart of the differences in some SQL implementations.

TestmnType AType BType CProposal
a1252222
b− 125− 2− 2− 23
c− 12− 5− 2− 2(− 2, 3)(2, − 3)
d− 12− 5222− 2
eNULL5NULLNULLNULLNULL
fNULLNULLNULLNULLNULLNULL
g12NULLNULLNULLNULLNULL
h12012NULLerror12
i− 120− 12NULLerror− 12
j050000
k0− 50000

t0010

Type A:

Oracle 7.0 and Oracle 8.0

Type B:

DataFlex—ODBC:

SQL Server 6.5, SP2

SQLBase Version 6.1 PTF level 4

Xbase

Type C:

DB2/400, V3r2:

DB2/6000 V2.01.1

Sybase SQL Anywhere 5.5

Sybase System 11

10.6.2 Exponential Functions

The exponential functions use powers and roots, so they are concerned with floating point numbers

POWER(x, n) = Raise the number x to the n-th power. If either parameter is NULL, then the result is NULL. If x is zero and n is negative, then an exception condition is raised: data exception — invalid. A non-negative number to the zero power is always one. and VE is positive, then the result is zero.

SQRT(x) = Return the square root of x. It is defined as a shorthand for POWER (x, 0.5).

LN(x) = Natural logarithm of x. If x is zero or negative, then an exception condition is raised: data exception—invalid argument for natural logarithm.

EXP(x) = Returns the constant e (~ 2.71828182845904523536.) to the x power; the inverse of a natural logarithm. If x is NULL then the result is NULL. If the result is not representable in the declared type of the result, then an exception is raised.

10.6.3 Scaling Functions

FLOOR(x) = The largest integer less than or equal to x. If x is NULL then the result is NULL.

CEILING(x) = The smallest integer greater than or equal to x. If x is NULL then the result is NULL.

While not part of the Standards, these are very common in actual products. They can be written with multiplication and division, which would be subject the local truncation and rounding rules of their product.

ROUND(x, p) = Round the number x to p decimal places. If either parameter is NULL, the result is NULL.

TRUNCATE(x, p) = Truncate the number x to p decimal places. If either parameter is NULL, the result is NULL.

10.6.4 Other Mathematical Functions

Vendors often include trigonometry and other math functions. These were easy to add from existing compilers for other languages, but are not part of the standard and do not port very well. Two proprietary functions that use an expression list are:

LEAST (< expression list >) = The expressions have to be of the same data type. This function returns the lowest value, whether numeric, temporal or character.

GREATEST(< expression list >) = As above, but it returns the highest value.

These functions are in MySQL, Oracle, Mimer and other SQL products, but are often mimicked with CASE expressions in actual code.

10.6.5 Converting Numbers to Words

A common function in report writers converts numbers into words so that they can be used to print checks, legal documents and other reports. This is not a common function in SQL products, nor is it part of the standards.

A method for converting numbers into words using only standard SQL by Stu Bloom follows. This was posted on 2002 Jan 02 on the SQL Server Programming newsgroup. First, create a table

CREATE TABLE Nbr_Words
number INTEGER PRIMARY KEY,
word VARCHAR(30) NOT NULL);

Then populate it with the literal strings of Nbr_Words from 0 to 999. Assuming that your range is 1-999, 999, 999 use the following query; it should be obvious how to extend it for larger numbers and fractional parts.

CASE WHEN :num < 1000
THEN (SELECT word FROM Nbr_Words
 WHERE number = :num)
WHEN :num < 1000000
THEN (SELECT word FROM Nbr_Words
WHERE number = :num / 1000)
|| ' thousand '
|| (SELECT word FROM Nbr_Words
WHERE MOD (number = :num, 1000))
WHEN :num < 1000000000
THEN (SELECT word FROM Nbr_Words
WHERE number = :num / 1000000)
|| ' million '
|| (SELECT word FROM Nbr_Words
WHERE number = OD((:num / 1000), 1000))
|| CASE WHEN MOD((:num / 1000), 1000) > 0
THEN ' thousand '
ELSE '' END
|| (SELECT word FROM Nbr_Words
WHERE number = MOD(:num, 1000))
END;

Whether 2500 is “Twenty-Five Hundred” or “Two Thousand Five Hundred” is a matter of taste and not science. This can be done with a shorter list of words and a different query, but this is probably the best compromise between code and the size of the table.

10.7 IP Addresses

While they are not a numeric data type, IP addresses are common enough to deserve a section and this is the best place to put it in this book.

Internet Protocol version 6 (IPv6) replaces Internet Protocol version (IPv4), which was made up of four integers each in the range 0 to 255, separated by dots. The problem was that we are running out of IP space. Version 6 requires eight sets of four hexadecimal digits separated by colons.

While technically a numeric data type, IP addresses are stored as binary and displayed with digital strings. IPv6 was defined in December 1998 by the Internet Engineering Task Force (IETF) with the publication of an Internet standard specification, RFC 2460. There was no version five; it was an experimental flow-oriented streaming protocol (Internet Stream Protocol) intended to support video and audio.

The new standard uses of a 128-bit address, whereas IPv4 uses only 32 bits. There are a lot of details and new things in the standards, but I do not care about them for this discussion; this is only about data representation.

Most of the world is still on IPv4 and it will be awhile before we are converted to the new standard.

10.7.1 CHAR(39) Storage

You could keep the IP address as a CHAR (39) that is (8 * 4 digits + 7 colons) and an easy regular expression in the DDL. The main advantage is that this is human readable and binary is not. But it is legal to drop leading zeroes in each group for readability. While that is a good goal, it makes comparisons a bit harder.

10.7.2 Binary Storage

Most current hardware supports 64 bit integers, but not 128 bits. Thankfully, the IPv6 standard uses a host identifier portion of 64 bits to facilitate an automatic mechanism for forming the host identifier from Link Layer media addressing information (MAC address). It is possible to use two BIGINTs for the data.

10.7.3 Separate SMALLINTs

The IP address is displayed as groups which each have meaning in the system, so we can model an IP address in separate columns. Notice that the IP address is still an atomic data element, but it is being modeled as scalar values. Check that you have such a data type in your product; if not, you can define it as

CREATE DOMAIN SmallInt
AS INTEGER DEFAULT 0 CHECK (VALUE BETWEEN 0 AND 65535);

Then use that data type to declare a nonscalar atomic data element, thus:

ip1 SMALLINT NOT NULL,
ip2 SMALLINT NOT NULL,
..
ip8 SMALLINT NOT NULL

The trick here is to index the octets in reverse order, since the final grouping is the most selective.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.93.169