CHAPTER 11

image

Working with Numbers

by Jonathan Gennick

SQL Server supports integer, decimal, and floating-point numbers. Working with numbers requires an understanding of the types of numbers available and what they are capable of doing. Implicit conversion rules sometimes lead to surprising results from seemingly simple-to-understand expressions. The recipes in this chapter show some of the more common operations as well as techniques for guarding against unexpected and unwanted results.

11-1. Representing Integers

Problem

You are writing T-SQL or creating a table and want to represent integer data in a binary format.

Solution

Choose one of the four integer data types provided in SQL Server. Here is a code block showing the four types and their range of valid values:

DECLARE @bip bigint, @bin bigint
DECLARE @ip int, @in int
DECLARE @sip smallint, @sin smallint
DECLARE @ti tinyint

SET @bip =  9223372036854775807   /*  2^63-1 */
SET @bin =  -9223372036854775808  /* -2^63   */
SET @ip  =            2147483647  /*  2^31-1 */
SET @in  =           -2147483648  /* -2^31   */
SET @sip =                 32767  /*  2^15-1 */
SET @sin =                -32768  /* -2^15   */
SET @ti  =                   255  /*  2^8-1  */

SELECT 'bigint' AS type_name, @bip AS max_value, @bin AS min_value
UNION ALL
SELECT 'int', @ip, @in
UNION ALL
SELECT 'smallint', @sip, @sin
UNION ALL
SELECT 'tinyint', @ti, 0
ORDER BY max_value DESC;

How It Works

SQL Server supports four integer data types. Each allocates a specific number of bytes for use in representing integer values. From largest to smallest, the types are as follows:

  • bigint (eight bytes)
  • int (four bytes)
  • smallint (two bytes)
  • tinyint (one byte)

The results from the solution example show the range of values supported by each of the types:

type_name            max_value            min_value
--------- -------------------- --------------------
bigint     9223372036854775807 -9223372036854775808
int                 2147483647          -2147483648
smallint                 32767               -32768
tinyint                    255                    0

Attempts to store an out-of-range value result in an overflow error. For example, decrement the minimum value for smallint by 1, and attempt to store that value, and you’ll get the following results:

DECLARE @sin smallint
SET @sin =               -32769
Msg 220, Level 16, State 1, Line 2
Arithmetic overflow error for data type smallint, value = -32769.

tinyint is a single byte limited to positive values only. The other three types do take negative values. SQL Server does not provide for unsigned versions of bigint, int, and smallint.

Choose from among the integer types based upon the range of values that you are working with. Don’t forget to allow for future growth. If storing the national debt, for example, you might want to jump straight to the bigint data type. Any of the types may also be used in CREATE TABLE statements. You can create table columns based upon the four types, as well as upon T-SQL variables, as shown in the example.

Image Note  The absolute value range in the negative direction is one greater than in the positive direction. That is because of the two’s-complement notation used internally by the database engine. If you’re curious, you can read more about two’s-complement in the following Wikipedia article: http://en.wikipedia.org/wiki/Two%27s_complement.

11-2. Creating Single-Bit Integers

Problem

Your application requires several on/off flags that you wish to store in the smallest possible space.

Solution

Store the flags using the type bit. For example:

DECLARE @SunnyDayFlag bit

SET @SunnyDayFlag = 1;
SET @SunnyDayFlag = 'true'

SELECT @SunnyDayFlag;

How It Works

Integers decrease in size from eight bytes to one byte as you move from bigint to tinyint. Using the bit type, you can define a column or variable that can be set to 1, 0, or null.

The values 'true' and 'false' (case-insensitive) equate to 1 and 0, respectively. SQL Server coalesces bit variables into groups of eight or fewer, storing up to eight values in a single byte.

Image Caution  While the official documentation lumps bit with the integer types, it is a type better suited for true/false flags than for numeric values you want to use in expressions.

11-3. Representing Decimal and Monetary Amounts

Problem

You are working with decimal data, such as monetary amounts, for which precise, base-10 representation is critical. You want to create a variable or table column of an appropriate type.

Solution

Use the decimal data type. Specify the total number of digits needed. Also specify how many of those digits are to the right of the decimal point. Here’s an example:

DECLARE @x0 decimal(7,0) = 1234567.
DECLARE @x1 decimal(7,1) = 123456.7
DECLARE @x2 decimal(7,2) = 12345.67
DECLARE @x3 decimal(7,3) = 1234.567
DECLARE @x4 decimal(7,4) = 123.4567
DECLARE @x5 decimal(7,5) = 12.34567
DECLARE @x6 decimal(7,6) = 1.234567
DECLARE @x7 decimal(7,7) = .1234567

The first parameter to decimal indicates the overall number of digits. The second parameter indicates how many of those digits are to the right of the decimal place.

How It Works

Choose the decimal type whenever the accurate representation of decimal values is important. You’ll be able to accurately represent values to the number of digits you specify, with none of the rounding or imprecision that often results from floating-point types and their use of base-2.

The two parameters to a decimal declaration are termed precision and scale. Precision refers to the overall number of digits. Scale refers to the location of the decimal point in respect to those digits. The default precision and scale are 18 and 0.

The number of digits of precision in the solution example is held constant at seven. The changing location of the decimal point indicates the effect of different values for scale.

Image Tip  Monetary values are a particularly good application of the decimal type. For example, a declaration of decimal(11,2) allows a range of values from -$999,999,999.99 to $999,999,999.99.

11-4. Representing Floating-Point Values

Problem

You are performing scientific calculations and need the ability to represent floating-point values.

Solution

Choose one of the floating-point types supported by SQL Server. As a practical matter, you have the following choices:

DECLARE @x1 real  /* same as float(24) */
DECLARE @x2 float /* same as float(53) */
DECLARE @x3 float(53)
DECLARE @x4 float(24)

How It Works

Table 11-1 gives the absolute-value ranges supported by the declarations in this solution. For example, the largest magnitude real number is 3.40E+38. That value can, of course, be either positive or negative. The least magnitude value other than zero that you can represent is 1.18E-38. If you must represent a value of smaller magnitude, such as 1.18E-39, you would need to look toward the float type.

All values in Table 11-1 can be either positive or negative. Storing zero is also always an option.

Table 11-1. Floating-Point Value Ranges

Declaration

Minimum Absolute Value

Maximum Absolute Value

real

1.18E-38

3.40E+38

float

2.23E-308

1.79E+308

float(53)

2.23E-308

1.79E+308

float(24)

1.18E-38

3.40E+38

You can specify float(n) using any n from 1 to 53. However, any value n from 1..24 is treated as 24. Likewise, any value n from 25..53 is treated as 53. A declaration of float(25) is thus the same as float(53).

Types real and float(24) are equivalent and require 7 bytes of storage. Types float and float(53) are equivalent and require 15 bytes of storage.

11-5. Writing Mathematical Expressions

Problem

You are working with number values and want to write expressions to compute new values.

Solution

Write any expression you like, making use of SQL Server’s supported operators and functions. For example, the expression in the following code block computes the new balance of a home loan after a payment of $500. The loan balance is $94,235.49. The interest rate is 6 percent. Twelve monthly payments are made per year.

DECLARE @cur_bal decimal(7,2) = 94235.49
DECLARE @new_bal decimal(7,2)

SET @new_bal = @cur_bal - (500.00 - ROUND(@cur_bal * 0.06 / 12.00, 2))
SELECT @new_bal;

The result will be 94206.67.

How It Works

You can write expressions of arbitrary length involving combinations of values, function calls, and operators. In doing so, you must be aware of and respect the rules of operator precedence. For example, multiplication occurs before addition, as is standard in mathematics.

Table 11-2 lists operators in order of their evaluation priority. The table lists all operators, including the nonmathematical ones.

Table 11-2. Operator Precedence in SQL Server

Priority Level

Operator

Description

1

~

Bitwise NOT

2

*, /, %

Multiply, divide, modulo

3

+, -

Positive sign, negative sign

3

+, -

Add, subtract

3

+

String concatenate

3

&, ^, |

Bitwise AND, Bitwise exclusive OR, Bitwise OR

4

=, <, <=, !<, >, >=, !>, <>, !=

Equals, less than, less than or equal, not less than, greater than, greater than or equal, not greater than, not equal, not equal

5

NOT

Logical NOT

6

AND

Logical AND

7

ALL, ANY, BETWEEN, IN, LIKE, OR, SOME

Logical OR and others

8

=

Assignment

Use parentheses to override the default priority. The solution example includes parentheses to force the monthly interest amount to be subtracted from the $500 monthly payment, leaving the amount to be applied to the principal.

(500.00 - ROUND(@cur_bal * 0.06 / 12.00 ,2))

Omit the outer parentheses, and you’ll get a very different result.

Image Tip  It’s a reasonable practice to include parentheses for clarity, especially when using operators other than the fundamental four: +, -, *, and /. Not everyone has the precedence table memorized. You can make it easy on your successors and clarify your intentions by including parentheses in cases where misinterpretation is likely.

Another issue to contend with is data-type precedence and the presence or absence of implicit conversions. Recipe 11-6, coming next, helps you guard against incorrect results from mixing data types within an expression.

11-6. Casting Between Data Types

Problem

You want to guard against trouble when writing an expression involving values from more than one data type.

Solution

Consider explicitly converting values between types to maintain full control over your expressions and their results. For example, invoke CAST and CONVERT as follows to change values from one type to another:

SELECT 6/100,
       CAST(6 AS DECIMAL(1,0)) / CAST(100 AS DECIMAL(3,0)),
       CAST(6.0/100.0 AS DECIMAL(3,2));

SELECT 6/100,
       CONVERT(DECIMAL(1,0), 6) / CONVERT(DECIMAL(3,0), 100),
       CONVERT(DECIMAL(3,2), 6.0/100.0);

The results from both these queries are as follows:

--- ----------- ------
  0    0.060000   0.06

Choose either CAST or CONVERT depending upon the importance you attach to complying with the ISO SQL standard. CAST is a standard function. CONVERT is specific to SQL Server. My opinion is to favor CAST unless you have some specific need for functionality offered by CONVERT.

How It Works

One of the most common implicit conversion errors in SQL Server is actually the result of an implicit conversion not occurring in a specific case when a cursory glance would lead one to expect it to occur. That case involves the division of numeric values written as integers, such as 6/100.

Recall the solution example from Recipe 11-5. Instead of writing the six percent interest rate as 0.06, write it as 6/100 inside parentheses. Make just that one change, and the resulting code is as follows:

DECLARE @cur_bal decimal(7,2) = 94235.49
DECLARE @new_bal decimal(7,2)

SET @new_bal = @cur_bal - (500.00 - ROUND(@cur_bal * (6/100) / 12.00 ,2))
SELECT @new_bal;

Execute this code, and the result changes from the correct result of 94206.67 as given in Recipe 11-5 to the incorrect result of 93735.49. Why the change? It’s because 6 and 100 are written with no decimal points, so they are treated as integers. Integer division then ensues. The uninitiated expects 6/100 to evaluate to 0.06, but integer division leads to a result of zero. The interest rate evaluates to zero, and too much of the loan payment is applied to the principal.

Image Caution  Keep in mind that numeric constants written without a decimal point are treated as integers. When writing an expression involving constants along with decimal values, include decimal points in your constants so they are also treated as decimals—unless, of course, you are certain you want them written as integers.

The solution in this case is to recognize that the expression requires decimal values and write either 0.06 or 6.0/100.0 instead. For example, the following version of the expression will yield the same correct results as in Recipe 11-5:

SET @new_bal = @cur_bal - (500.00 - ROUND(@cur_bal * (6.0/100.0) / 12.00 ,2))

What of the values 500.00 and 12.00? Can they be written as 500 and 12? It turns out that they can be written that way. The following expression yields correct results:

SET @new_bal = @cur_bal - (500 - ROUND(@cur_bal * (6.0/100.0) / 12 ,2))

You can get away in this case with 500 and 12, because SQL Server applies data type precedence. In the case of 500, the value being subtracted is a decimal value. Thus, the database engine implicitly converts 500 to a decimal. For much the same reason, the integer 12 is also promoted to a decimal. That conversion makes sense in this particular case, but it may not always be what you want.

Table 11-3 lists data types by precedence. Any time an operator works on values of two different types, the type lower on the scale is promoted to the type higher on the scale. If such a conversion is not what you want, or if you just want to clearly specify the conversion to remove any doubt, invoke either the CAST or the CONVERT function.

Table 11-3. Data-Type Precedence in SQL Server

Tab3

The following is one last restatement of Recipe 11-5’s solution. The original solution used ROUND to force the interest amount to two decimal places, but what was the resulting data type? Do you know? Perhaps it is better to be certain. The following code casts the result of the interest computation to the type decimal(7,2). The rounding still occurs, but this time as part of the casting operation.

DECLARE @cur_bal decimal(7,2) = 94235.49
DECLARE @new_bal decimal(7,2)

SET @new_bal = @cur_bal - (500.00 - CAST(@cur_bal * (6.0/100.0) / 12.00 AS decimal(7,2)))
SELECT @new_bal;

The result is 94206.67.

Remember in particular the tricky case of integer division in instances such as 6/100. That behavior is unintuitive and leads to many errors. Otherwise, the implicit conversions implied by the precedence levels in Table 11-3 tend to make sense and produce reasonable results. Whenever values from two types are involved in the same expression, the value of the type having the lower precedence is converted into an instance of the type having the higher precedence. Even so, I recommend explicit conversions in all but the most obvious cases. If you aren’t absolutely certain at a glance just what is occurring within an expression, then make the conversions explicit.

11-7. Converting Numbers to Text

Problem

You have numeric values that you want to represent in human-readable form.

Solution

Make use of the CONVERT function to specify one of the character types as being the target data type. The following example converts product prices and weights to strings of type NVARCHAR:

SELECT ProductID, Name,
       CONVERT(NVARCHAR, ListPrice, 1) AS 'Price',
       CONVERT(NVARCHAR, Weight) AS 'Weight'
FROM Production.Product
WHERE ListPrice > 0 AND Weight IS NOT NULL;

How It Works

You saw CONVERT used in Recipe 11-6 to convert from one number type to another, but it can also convert to text. The output from the solution example is as follows:

ProductID Name                       Price     Weight
--------- -------------------------- --------- ------
...
      719 HL Road Frame - Red, 48     1,431.50 2.16
      720 HL Road Frame - Red, 52     1,431.50 2.20
      721 HL Road Frame - Red, 56     1,431.50 2.24
      722 LL Road Frame - Black, 58   337.22   2.46
      723 LL Road Frame - Black, 60   337.22   2.48
      724 LL Road Frame - Black, 62   337.22   2.50
...

An optional third parameter provides limited control over the specific textual format that is used. Table 11-4 describes the available styles and their parameter values. The first set of style numbers applies to floating-point and real-input values; the second set applies to values in one of the monetary types.

Table 11-4. Style Values for Use with the CONVERT Function

Type Family

Style Number

Result

Floating-point

0

Gives zero to six digits and scientific notation when needed. This is the default style when floating-point values are converted to text.

 

1

Gives eight digits and scientific notation.

 

2

Gives 16 digits and scientific notation.

Money

0

Allows two decimal digits. No commas used between digit groups. This is the default style for non-floating-point conversions.

 

1

Allows two decimal digits and includes commas between digit groups.

 

2

Allows four decimal digits, but no commas.

11-8. Converting from Text to a Number

Problem

You want to compute a human-readable representation of a number to one of the binary equivalents used by SQL Server to store numeric types.

Solution

Invoke the CONVERT function and specify a numeric type as the first parameter. For example:

SELECT 0-CONVERT(DECIMAL, NationalIDNumber) AS 'Negative ID'
FROM HumanResources.Employee;

This query converts national ID numbers from text to decimal, and arbitrarily makes them negative. Results are as follows:

Negative ID
----------
 -10708100
-109272464
-112432117
...

How It Works

In Recipe 11-7 you saw CONVERT used to represent numeric values in their textual form. You can also go the opposite direction. Specify the name of your desired numeric type as the first parameter, and pass a valid text representation as the second.

11-9. Rounding

Problem

You want to round a number value to a specific number of decimal places.

Solution

Invoke the ROUND function. Here’s an example:

SELECT EndOfDayRate,
       ROUND(EndOfDayRate,0) AS EODR_Dollar,
       ROUND(EndOfDayRate,2) AS EODR_Cent
FROM Sales.CurrencyRate;

The results are as follows:

         EndOfDayRate           EODR_Dollar             EODR_Cent
--------------------- --------------------- ---------------------
               1.0002                  1.00                  1.00
                 1.55                  2.00                  1.55
               1.9419                  2.00                  1.94
               1.4683                  1.00                  1.47
               8.2784                  8.00                  8.28
...

How It Works

Invoke ROUND to round a number to a specific number of decimal places, as specified by the second argument. The solution example shows rounding both to the nearest integer (zero decimal places) and to the nearest hundredth (two decimal places).

Image Note  Digit values of 5 and higher round upward. Rounding 0.5 to the nearest integer yields 1.0 as a result.

You can invoke ROUND with a negative argument to round to the left of the decimal place. The following is an example that rounds product inventories to the nearest 10 units and to the nearest 100 units:

SELECT ProductID, SUM(Quantity) AS Quantity,
       SUM(ROUND(Quantity,-1)) as Q10,
       SUM(ROUND(Quantity,-2)) as Q100
FROM Production.ProductInventory
GROUP BY ProductID;

The results show the effects of rounding away from the decimal place:

  ProductID    Quantity         Q10        Q100
----------- ----------- ----------- -----------
          1        1085        1080        1100
          2        1109        1110        1100
          3        1352        1350        1300
          4        1322        1320        1300
...

ROUND usually returns a value. However, there is one case to beware of. It comes about because ROUND returns its result in the same data type as the input value. The following three statements illustrate the instance in which ROUND will throw an error:

SELECT ROUND(500,-3);
SELECT ROUND(500.0,-4);
SELECT ROUND(500.0,-3);

The first and second statements will return 1000 and 0.0, respectively. But the third query will throw an error as follows:

Msg 8115, Level 16, State 2, Line 2
Arithmetic overflow error converting expression to data type numeric.

ROUND(500,-3) succeeds because the input value is an integer constant. (No decimal point means that 500 is considered as an integer.) The result is thus also an integer, and an integer is large enough to hold the value 1000.

ROUND(500.0,-4) returns zero. The input value indicates a type of decimal(4,1). The value rounds to zero because the value is being rounded too far to the left. Zero fits into the four-digit precision of the implied data type.

ROUND(500.0,-3) fails because the result is 1000. The value 1000 will not fit into the implied data type of decimal(4,1). You can get around the problem by casting your input value to a larger precision. Here’s an example:

SELECT ROUND(CAST(500.0 as DECIMAL(5,1)),-3)
---------------------------------------
                                 1000.0

This time, the input value is explicitly made to be decimal(5,1). The five digits of precision leave four to the left of the decimal place. Those four are enough to represent the value 1000.

11-10. Rounding Always Up or Down

Problem

You want to force a result to an integer value. You want to always round either up or down.

Solution

Invoke CEILING to always round up to the next integer value. Invoke FLOOR to always round down to the next lowest integer value. Here’s an example:

SELECT CEILING(-1.23), FLOOR (-1.23), CEILING(1.23), FLOOR(1.23);

The results are as follows:

------- ------- ------- -------
     -1      -2       2       1

How It Works

CEILING and FLOOR don’t give quite the same flexibility as ROUND. You can’t specify a number of decimal places. The functions simply round up or down to the nearest integer, period.

You can work around the nearest integer limitation using a bit of math. For example, to invoke CEILING to the nearest cent and to the nearest hundred, use this:

SELECT CEILING(123.0043*100.0)/100.0 AS toCent,
       CEILING(123.0043/100.0)*100.0 AS toHundred;
                                 toCent                               toHundred
--------------------------------------- ---------------------------------------
                             123.010000                                   200.0

We don’t trust this technique for binary floating-point values. However, it should work fine on decimal values so long as the extra math doesn’t push those values beyond the bounds of precision and scale that the decimal can support.

11-11. Discarding Decimal Places

Problem

You want to just “chop off” the digits past the decimal point. You don’t care about rounding at all. You just want zeros.

Solution

Invoke the ROUND function using a third parameter that is nonzero. Here’s an example:

SELECT ROUND(123.99,0,1), ROUND(123.99,1,1), ROUND(123.99,-1,1);
------- ------- -------
 123.00  123.90  120.00

Do be aware that Management Studio will by default display two digits past the decimal point. You can see however, how the third parameter is causing the ROUND function to round downward to zero.

How It Works

Some database brands (Oracle, for example) implement a TRUNCATE function to eliminate values past the decimal point. SQL Server accomplishes that task using the ROUND function. Make the third parameter anything but zero, and the function will truncate rather than round.

11-12. Testing Equality of Binary Floating-Point Values

Problem

You are testing two binary floating-point values for equality, but the imprecision inherent in floating-point representation is causing values that you consider equal to be rejected as not equal.

Solution

Decide on a difference threshold below which you will consider the two values to be equal. Then test the absolute value of the difference to see whether it is less than your threshold. For example, the following example assumes a threshold of 0.000001 (one one-millionth):

DECLARE @r1 real = 0.95
DECLARE @f1 float = 0.95
IF ABS(@r1-@f1) < 0.000001
   SELECT 'Equal'
ELSE
   SELECT 'Not Equal'

The difference is less than the threshold, so the values are considered to be equal. The result is as follows:

-----
Equal

How It Works

Not all decimal values can be represented precisely in binary floating-point. In addition, different expressions that should in theory yield identical results sometimes differ by tiny amounts. The following is a query block to illustrate the problem:

DECLARE @r1 real = 0.95
DECLARE @f1 float = 0.95
SELECT @r1, @f1, @r1-@f1;

Both values are the same but not really. The results are as follows:

------------- ---------------------- ----------------------
         0.95                   0.95  -1.19209289106692E-08

The 00 fundamental problem is that the base-2 representation of 0.95 is a never-ending string of bits. The float type is larger, allowing for more bits, which is the reason for the nonzero difference. By applying the threshold method shown in the solution, you can pretend that the tiny difference does not exist.

Image Caution  The solution in this recipe represents a conscious decision to disregard small differences in order to treat two values as being equal. Make that decision while keeping in mind the context of how the values are derived and the context of the problem that is being solved.

11-13. Treating Nulls as Zeros

Problem

You are writing expressions with numeric values that might be null. You want to treat nulls as zero.

Solution

Invoke the COALESCE function to supply a value of zero in the event of a null. For example, the following query returns the MaxQty column from Sales.SpecialOffer. That column is nullable. COALESCE is used to supply a zero as an alternate value.

SELECT SpecialOfferID, MaxQty, COALESCE(MaxQty, 0) AS MaxQtyAlt
FROM Sales.SpecialOffer;

The results are as follows:

SpecialOfferID      MaxQty   MaxQtyAlt
-------------- ----------- -----------
             1        NULL           0
             2          14          14
             3          24          24
             4          40          40
             5          60          60
             6        NULL           0
             7        NULL           0
...

How It Works

COALESCE is an ISO standard function that takes as its input any number of values. It returns the first non-null value in the resulting list. The solution example invokes COALESCE to return a zero in the event MaxQty is null.

SQL Server also implements an ISNULL function, which is propriety and takes only two arguments, but otherwise is similar to COALESCE in that it returns the first non-null value in the list. You can implement the solution example using ISNULL as follows and get the same results:

SELECT SpecialOfferID, MaxQty, ISNULL(MaxQty, 0) AS MaxQtyAlt
FROM Sales.SpecialOffer;

It’s generally good practice to avoid invoking either COALESCE or ISNULL within a WHERE clause predicate. Applying functions to a column mentioned in a WHERE clause can inhibit the use of an index on the column. Here’s an example of what we try to avoid:

SELECT SpecialOfferID
FROM Sales.SpecialOffer
WHERE COALESCE(MaxQty,0) = 0;

In a case like this, we prefer to write an IS NULL predicate, as follows:

SELECT SpecialOfferID
FROM Sales.SpecialOffer
WHERE MaxQty = 0 OR MaxQty IS NULL;

We believe the IS NULL approach preserves the greatest amount of flexibility for the optimizer.

11-14. Generating a Row Set of Sequential Numbers

Problem

You need to generate a row set with an arbitrary number of rows. For example, you want to generate one row per day in the year so that you can join to another table that might be missing rows for some of the days, with the goal of ultimately creating a row set that has one row per day.

Solution

Many row-generator queries are possible. The following is one solution I particularly like. It is a variation on a technique introduced to me by database expert Vladimir Przyjalkowski in 2004. It returns rows in power-of-ten increments controlled by the number of joins that you write in the outer query’s FROM clause. This particular example returns 10,000 rows numbered from 0 to 9999.

WITH ones AS (
    SELECT *
    FROM (VALUES (0), (1), (2), (3), (4),
                 (5), (6), (7), (8), (9)) AS numbers(x)
)
SELECT 1000*o1000.x + 100*o100.x + 10*o10.x + o1.x x
FROM ones o1, ones o10, ones o100, ones o1000
ORDER BY x;

The results are as follows:

          x
-----------
          0
          1
          2
          3
...
       9997
       9998
       9999

If you like, you can restrict the number of rows returned by wrapping the main query inside of an enclosing query that restricts the results. Be sure to keep the WITH clause first. Also specify an alias for the new, enclosing query. The following example specifies n as the alias:

WITH ones AS (
    SELECT *
    FROM (VALUES (0), (1), (2), (3), (4),
                 (5), (6), (7), (8), (9)) AS numbers(x)
)
SELECT n.x FROM (
    SELECT 1000*o1000.x + 100*o100.x + 10*o10.x + o1.x x
    FROM ones o1, ones o10, ones o100, ones o1000
) n
WHERE n.x < 5000
ORDER BY x;

This version returns 5,000 rows numbered from 0 through 4999.

How It Works

Row sets of sequential numbers are handy for data densification. Data densification refers to the filling in of missing rows, such as in time series data. Imagine, for example, that you want to generate a report showing how many employees were hired on each day of the year. A quick test of the data shows that hire dates are sparse—there are only a few days in a given year on which employees have been hired. Here’s an example:

SELECT DISTINCT HireDate
FROM HumanResources.Employee
WHERE HireDate >= '2012-01-01'
  AND HireDate < '2013-01-01'
ORDER BY HireDate;

The results indicate that hires occur sparsely throughout the year:

HireDate
----------
2012-04-16
2012-05-30
2012-09-30

Using the solution query, you can create a sequence table to use in densifying the data so as to return one row per day, regardless of number of hires. Begin by creating a 1,000-row table using a form of the solution query:

WITH ones AS (
    SELECT *
    FROM (VALUES (0), (1), (2), (3), (4),
                 (5), (6), (7), (8), (9)) AS numbers(x)
)
SELECT 100*o100.x + 10*o10.x + o1.x x
INTO SeqNum
FROM ones o1, ones o10, ones o100;

Now it’s possible to join against SeqNum and use that table as the basis for generating one row per day in the year. Here’s an example:

SELECT DATEADD(day, x, '2012-01-01'), HireDate
FROM SeqNum LEFT OUTER JOIN HumanResources.Employee
     ON DATEADD(day, x, '2012-01-01') = HireDate
WHERE x < DATEDIFF (day, '2012-01-01', '2013-01-01')
ORDER BY x;

The results are as follows. The HireDate column is non-null for days on which a hire was made.

                        HireDate
----------------------- ----------
2012-01-01 00:00:00.000 NULL
2012-01-02 00:00:00.000 NULL
...
2012-04-15 00:00:00.000 NULL
2012-04-16 00:00:00.000 2012-04-16
2012-04-17 00:00:00.000 NULL
...

Add a simple GROUP BY operation to count the hires per date, and we’re done! Here’s the final query:

SELECT DATEADD(day, x, '2012-01-01'), COUNT(HireDate)
FROM SeqNum LEFT OUTER JOIN HumanResources.Employee
     ON DATEADD(day, x, '2012-01-01') = HireDate
WHERE x < DATEDIFF (day, '2012-01-01', '2013-01-01')
GROUP BY x
ORDER BY x;

Results now show the number of hires per day. The following are results for the same days as in the previous output. This time, the count of hires is zero on all days having only null hire dates. The count is 1 on May 18, 2006, for the one person hired on that date.

----------------------- -----------
2012-01-01 00:00:00.000           0
2012-01-02 00:00:00.000           0
...
2012-04-15 00:00:00.000           0
2012-04-16 00:00:00.000           1
2012-04-17 00:00:00.000           0
...

You’ll receive a warning message upon executing the final query. The message is nothing to worry about. It reads as follows:

Warning: Null value is eliminated by an aggregate or other SET operation.

This message simply indicates that the COUNT function was fed null values, and indeed that is the case. Null hire dates were fed into the COUNT function. Those nulls were ignored and not counted, which is precisely the behavior wanted in this case.

11-15. Generating Random Integers in a Row Set

Problem

You want each row returned by a query to include a random integer value. You further want to specify the range within which those random values will fall. For example, you want to generate a random number between 900 and 1,000 for each product.

Solution

Invoke the built-in RAND() function, as shown in the following example:

DECLARE @rmin int, @rmax int;
SET @rmin = 900;
SET @rmax = 1000;
SELECT Name,
       CAST(RAND(CHECKSUM(NEWID())) * (@rmax-@rmin) AS INT) + @rmin
FROM Production.Product;

You’ll get results as follows, except that your random numbers might be different from mine:

Name
-------------------------------------------------- -----------
Adjustable Race                                            939
All-Purpose Bike Stand                                     916
AWC Logo Cap                                               914
BB Ball Bearing                                            992
Bearing Ball                                               975

How It Works

RAND() returns a random float value between 0 exclusive and 1 exclusive. RAND() accepts a seed parameter, and any given seed will generate the same result. These are two characteristics you must keep in mind and compensate for as you use the function.

The following is the simplest possible invocation of RAND() in a query against Production.Product. The resulting “random” number is not very random at all. SQL Server treats the function as deterministic because of the lack of a parameter, invokes the function just one time, and applies the result of that invocation to all rows returned by the query.

SELECT Name, RAND()
FROM Production.Product;
Name
-------------------------------------------------- ----------------------
Adjustable Race                                         0.472241415009636
All-Purpose Bike Stand                                  0.472241415009636
AWC Logo Cap                                            0.472241415009636
BB Ball Bearing                                         0.472241415009636
Bearing Ball                                            0.472241415009636

What’s needed is a seed value that changes for each row. A common and useful approach is to base the seed value on a call to NEWID(). NEWID() returns a value in a type not passable to RAND(). You can work around that problem by invoking CHECKSUM() on the NEWID() value to generate an integer value acceptable as a seed. Here’s an example:

SELECT Name, RAND(CHECKSUM(NEWID()))
FROM Production.Product;
Name
-------------------------------------------------- ----------------------
Adjustable Race                                         0.943863936349248
All-Purpose Bike Stand                                  0.562297100626295
AWC Logo Cap                                            0.459806720686023
BB Ball Bearing                                         0.328415563433923
Bearing Ball                                            0.859439320073147

The NEWID() function generates a globally unique identifier. Because the result must be globally unique, no two invocations of NEWID() will return the same result. The function is therefore not deterministic, and the database engine thus invokes the RAND(CHECKSUM(NEWID())) expression anew for each row.

Now comes some math. It’s necessary to shift the random values from their just-greater-than-zero to less-than-one range into the range, in this case, of 900 to 1000. Begin by multiplying the result from RAND() by the magnitude of the range. Do that by multiplying the random values by 100, which is the difference between the upper and lower bounds of the range. Here’s an example:

DECLARE @rmin int, @rmax int;
SET @rmin = 900;
SET @rmax = 1000;
SELECT Name,
       RAND(CHECKSUM(NEWID())) * (@rmax-@rmin)
FROM Production.Product;
Name
-------------------------------------------------- ----------------------
Adjustable Race                                          12.5043506882683
All-Purpose Bike Stand                                   46.3611080374763
AWC Logo Cap                                             17.1908607269767
BB Ball Bearing                                          89.5318634996859
Bearing Ball                                               50.74511276104
...

Next is to shift the spread of values so that they appear in the desired range. Do that by adding the minimum value as shown in the following query and its output. The result is a set of random values beginning at just above 900 and going to just less than 1000.

DECLARE @rmin int, @rmax int;
SET @rmin = 900;
SET @rmax = 1000;
SELECT Name,
       RAND(CHECKSUM(NEWID())) * (@rmax-@rmin) + @rmin
FROM Production.Product;
Name
-------------------------------------------------- ----------------------
Adjustable Race                                          946.885865947398
All-Purpose Bike Stand                                   957.087533428096
AWC Logo Cap                                             924.321027483594
BB Ball Bearing                                          988.996724323006
Bearing Ball                                             943.797723186947

11-16. Reducing Space Used by Decimal Storage

Problem

You have very large tables with a great many decimal columns holding values notably smaller than their precisions allow. You want to reduce the amount of space to better reflect the actual values stored rather than the possible maximums.

Image Note  The solution described in this recipe is available only in the Enterprise Edition of SQL Server.

Solution

Enable vardecimal storage for your database. Do that by invoking sp_db_vardecimal_storage_format, as follows:

EXEC sp_db_vardecimal_storage_format 'AdventureWorks2012', 'ON'

Then estimate the amount of space to be saved per table. For example, issue the following call to sp_estimated_rowsize_reduction_for_vardecimal to determine the average row length before and after vardecimal is enabled on the Production.BillOfMaterials table:

EXEC sys.sp_estimated_rowsize_reduction_for_vardecimal 'Production.BillOfMaterials'

Your results should be similar to the following:

avg_rowlen_fixed_format avg_rowlen_vardecimal_format            row_count
----------------------- ---------------------------- --------------------
                  57.00                        56.00                 2679

A one-byte-per-row savings is hardly worth pursuing. However, pursue it anyway by enabling vardecimal storage on the table:

sp_tableoption 'Production.BillOfMaterials', 'vardecimal storage format', 1

Be aware that converting to vardecimal is an offline operation. Be sure you can afford to take the table offline for the duration of the process.

How It Works

By switching on vardecimal storage for a table, you allow the engine to treat decimal values as variable length in much the same manner as variable-length strings are treated, trading an increase in CPU time for a reduction in storage from not having to store unused bytes. You enable the use of the option at the database level. Then you can apply the option on a table-by-table basis.

While the vardecimal option sounds great on the surface, we recommend some caution. Make sure that the amount of disk space saved makes it really worth the trouble of enabling the option. Remember that there is a CPU trade-off. The example enables the option for a 2,679-row table and would save about one byte per row on average. Such a savings is fine for a book example, but it’s hardly worth pursing in real life. Go for a big win, or don’t play at all.

You can generate a list of databases on your server that shows which ones have vardecimal is enabled. Issue the following command to do that:

EXEC sp_db_vardecimal_storage_format

Your results should resemble the following. The Database Name values may be displayed extremely wide in Management Studio. You may need to scroll left and right to see the Vardecimal State values. I’ve elided much of the space between the columns in this output for the sake of readability.

Database Name          Vardecimal State
------------------     ----------------
master                 OFF
tempdb                 OFF
model                  OFF
msdb                   OFF
AdventureWorks2012     ON

Similarly, you can issue the following query to generate a list of tables within a database for which the option is enabled. (Increase the VARCHAR size in the CAST if your table or schema names combine to be longer than 40 characters.)

SELECT CAST(ss.name + '.' + so.name AS VARCHAR(40)) AS 'Table Name',
       CASE objectproperty(so.object_id, N'TableHasVarDecimalStorageFormat')
            WHEN 1 then 'ON' ELSE 'OFF'
            END AS 'Vardecimal State'
FROM sys.objects so JOIN sys.schemas ss
     ON so.schema_id = ss.schema_id
WHERE so.type_desc = 'USER_TABLE'
ORDER BY ss.name, so.name;

Your results should be similar to the following:

Table Name                               Vardecimal State
---------------------------------------- ----------------
dbo.AWBuildVersion                       OFF
dbo.DatabaseLog                          OFF
dbo.ErrorLog                             OFF
...
Production.BillOfMaterials               ON
...

To disable vardecimal storage on a table, invoke the sp_tableoption procedure with a third parameter of 0 rather than 1. Disable the option at the database level by first disabling it for all tables and then by executing sp_db_vardecimal_storage_format with a second parameter of 'OFF'.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.51.191