CHAPTER 12

image

Reusable Standard Database Components

One little spark, of inspiration, is at the heart, of all creation. Right at the start, of everything that’s new. One little spark, lights up for you.

— The Sherman Brothers for Disney

As we near the end of the database design process, the database is pretty much completed from a design standpoint. We have spent time looking at performance, concurrency and security patterns that you can follow to help implement the database in a manner that will work well under most any typical OLTP style load. In this chapter (and again somewhat in the next two), we are going to look at applying “finishing touches” to the database that can be used to enhance the user experience and assist with the querying and maintaining of the database. We won’t flesh out all of the details for every concept in this chapter, because a lot of the examples include parts of the system that are decidedly more aligned to the DBA than the architect/programmer (or are just far too large to fit in this book). However, the point I will be making make here is that our goal will be end up with a self contained database container with as much of the database coding and maintenance functionality as possible. In Chapter 9, we introduced the concept of a contained database to help maintain a secure and portable database, and here I will present some additional add-in capabilities and expansion point ideas to add to make your database that much more usable.

The reality of database design is that most databases are rarely cookie-cutter affairs. Most companies, even when they buy a third-party package to implement some part of their business, are going to end up making (in many cases substantial) customizations to the database to fit their needs. If you are starting a very large project, you may even want to look at previous models or perhaps pre-built “universal data models,” such as those in Len Silverston’s series of books, the first of which is The Data Model Resource Book: A Library of Universal Data Models for All Enterprises (Wiley, 2001) (perhaps the only book on database design with a larger title that the book you hold in your hands right now.) Karen Lopez (@datachick on Twitter) frequently speaks on the subject of universal models in the SQL PASS universe of presenters that I am generally involved with. Even these universal models may only be useful as starting points to help you map from your “reality” to a common view of a given sort of business.

In this chapter, however, I want to explore the parts of the database that I find to be useful and almost always the same for every database I create. Not every database will contain all of what I will present, but when I need a common feature I will use the exact same code in every database, with the obvious caveat that I am constantly looking for new ways to improve almost everything I use over time (not to mention it gives me something to write about when I run out of Lego sets to build). Hence, sometimes a database may use an older version of a feature until it can be upgraded. I will cover the following topics:

  • Numbers table : A table of numbers, usually integers that can be used for a number of interesting uses (not many of the mathematics-related).
  • Calendar table : A table where every row represents a day, assisting in grouping data for queries, both for reports and operational usage.
  • Utility objects : Every programmer has code that they use to make their job easier. Utilities to monitor usage of the system; extended DDL to support operations that aren’t part of the base DDL in T-SQL.
  • Logging objects : Utilities to log the actions of users in the database, generally for system management reasons. A common use is an error log to capture when and where errors are occurring.
  • Other possibilities: In this section, I will present a list of additional ideas for ways to extend your databases in ways that will give you independent databases that have common implementation.

Not every database can look alike, even two that do almost the exact same thing will rarely be all that alike unless the designer is the same, but following the patterns of implementation we have discussed all throughout the book thus far and the practices we will discuss in this chapter, we can produce databases that are similar enough such that the people supporting your work will have it easy figuring out what you had in mind.

If you are dealing with a third party system where it is forbidden to add any of your own objects, even in a schema that is separated from the shipped schemas, don’t think that everything I am saying here doesn’t apply to you. All of the example code presented supposes a single database approach. In such cases a common approach is to create a companion database where you locate code you need to access their code from the database tier. Some examples would need to be slightly reworked for that model, but that rework would be minimal.

image  Note    For the examples in this chapter, I am going to use a copy of the Adventureworks2012 database to stick to the supposition of the chapter that we should place the tables in the database with the data you are working with. If you are working with a community version of AdventureWorks2012 that you cannot modify, you can build your own companion database for the examples. I will include a comment in the query to note where the data is specifically from that database. In cases where cross database access will not be trivial, I will note that in the code with a comment.

Numbers Table

A numbers table is a precalculated table of numbers, primarily non-negative integers, which you can use for some purpose. The name “numbers” is pretty open ended, but getting so specific as nonNegativeIntegers is going to get you ridiculed by the other programmers on the playground. In previous editions of the book I have used the name sequence, but with the addition of the sequence object, the name “numbers” was the next best thing. We will use the numbers table when we need to work with data in an ordered manner, particularly a given sequence of numbers. For example, if you needed a list of the top ten products sold and you only sold six, you would have to somehow manufacture four more rows for display. Having a table where you can easily output a sequence of numbers is going to be a very valuable asset at times indeed.

While you can make the numbers table contain any type of numbers you may need, usually it is just a simple table of non-negative integers from 0 to some reasonable limit, where reasonable is more or less how many you find you need. I generally load mine by default up to 99999 (99999 gives you full five digits (and is a very convenient number for the query I will use to load the table.) With the algorithm I will present, you can easily expand to create a sequence of numbers that is larger than you can store in SQL Server.

There are two really beautiful things behind this concept. First, the table of non-negative integers has some great uses dealing with text data, as well as doing all sorts of math with. Second, you can create additional attributes or even other sequence tables that you can use to represent other sets of numbers that you find useful or interesting. For example:

  • Even or odd, prime, squares, cubes, and so on
  • Other ranges or even other grains of values, for example, (-1, -.5, 0, .5, 1)
  • Letters of the alphabet

In the examples in this section, we will look at several techniques you may find useful, and possibly quite often. The code to generate a simple numbers table of integers is pretty simple; though it looks a bit daunting the first time you see it. It is quite fast to execute in this form, but no matter how fast it may seem, it is not going to be faster than querying from a table that has the sequence of numbers precalculated and stored ahead of time.

 ;WITH digits (I) AS

     (--set up a set of numbers from 0-9

      SELECT I

      FROM (VALUES (0),(1),(2),(3),(4),

      (5),(6),(7),(8),(9)) AS digits (I))

 ,integers (I) AS (

    SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I)

      -- + (10000*D5.I) + (100000*D6.I)

    FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3

    CROSS JOIN digits AS D4

      --CROSS JOIN digits AS D5 CROSS JOIN digits AS D6

 )

 SELECT I

 FROM integers

 ORDER BY I;

This code will return a set of 10,000 rows, as follows:

 I

 -----

 0

 1

 2

 …

 9998

 9999

Uncommenting the code for the D5 and D6 tables will give you an order of magnitude increase for each, up to 999,999 rows. The code itself is pretty interesting (this isn’t a coding book, but it is a really useful technique). Breaking the code down, you get the following:

 ;WITH digits (I) AS

    (--set up a set of numbers from 0-9

      SELECT I

      (VALUES (0),(1),(2),(3),(4),

      (5),(6),(7),(8),(9)) AS digits (I))

This is just simply a set of ten rows from 0 to 9. The next bit is where the true brilliance begins. (No, I am not claiming I came up with this. I first saw it on Erland Sommarskog’s web site a long time ago, using a technique I will show you in a few pages to split a comma-delimited string.) You cross-join the first set over and over, multiplying each level by a greater power of 10. The result is that you get one permutation for each number. For example, since 0 is in each set, you get one permutation that results in 0. You can see this better in the following smallish set:

 ;WITH digits (I) AS (--set up a set of numbers from 0-9

     SELECT i

     FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I))

 SELECT D1.I AS D1I, 10*D2.I AS D2I, D1.I + (10*D2.I) AS [Sum]

 FROM digits AS D1 CROSS JOIN digits AS D2

 ORDER BY [Sum];

This returns the following, and you can see that by multiplying the D2.I value by 10, you get the ten’s place repeated, giving you a very powerful mechanism for building a large set. In the full query, each of the additional digit table references have another power of ten in the SELECT clause multiplier , allowing you to create a very large set (rows removed and replaced with . . . for clarity and to save a tree):

D1I D2I Sum
------------- ------------- -------------
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
5 0 5
6 0 6
7 0 7
8 0 8
9 0 9
0 10 10
1 10 11
2 10 12
3 10 13
4 10 14
...
6 80 86
7 80 87
8 80 88
9 80 89
0 90 90
1 90 91
2 90 92
3 90 93
4 90 94
5 90 95
6 90 96
7 90 97
8 90 98
9 90 99

This kind of combination of sets is a very useful technique in relational coding. As I said earlier, this isn’t a query book, but I feel it necessary to show you the basics of why this code works, because it is a very good mental exercise. Using the full query, you can create a sequence of numbers that you can use in a query.

So, initially create a simple table named Number with a single column I (because it is a typical value used in math to denote an index in a sequence, such as x,, where the I denotes a sequence of values of x. The primary purpose of the Numbers is to introduce an ordering to a set to assist in an operation.). I will create this table in a schema named Tools to contain the types of tool objects, functions, and procedures we will build in this chapter. In all likelihood, this is a schema you would grant EXECUTE and SELECT to public and make the tools available to any user you have given ad hoc query access to.

 USE AdventureWorks2012;

 GO

 CREATE SCHEMA Tools;

 GO

 CREATE TABLE Tools.Number

 (

    I int NOT NULL CONSTRAINT PKTools_Number PRIMARY KEY

 );

Then I will load it with integers from 0 to 99999.

 ;WITH digits (I) AS (--set up a set of numbers from 0-9

    SELECT I

    FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I))

 --builds a table from 0 to 99999

 ,Integers (I) AS (

    SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I) + (10000*D5.I)

    --+ (100000*D6.I)

      FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3

     CROSS JOIN digits AS D4 CROSS JOIN digits AS D5

     /* CROSS JOIN digits AS D6 */)

 INSERT INTO Tools.Number(I)

 SELECT I

 FROM   Integers;

So if you wanted to count the integers between 1 and 1000 (inclusive), it is as simple as:

 SELECT COUNT(*)

 FROM   Tools.Number

 WHERE  I between 1 and 1000;

Of course, that would be a bit simplistic, and there had better be 1000 values between (inclusive) 1 and 1000, but what if you wanted the number of integers between 1 and 1000 that are divisible by 9 or 7?

 SELECT COUNT(*)

 FROM   Tools.Number

 WHERE  I BETWEEN 1 AND 1000

     AND (I % 9 = 0 OR I % 7 = 0);

This returns the obvious answer: 238. Sure, a math nerd could sit down and write a formula to do this, but why? And if you find you need these values quite often, you could create a table called Tools.DivisibleByNineAndSevenNumber , or add columns to the number table called DivisibleByNineFlag and DivisibleBySevenFlag if it were needed in context of integers that were not divisible by 9 or 7. The simple numbers table is the most typical need, but you can make a table of any sorts of numbers that you need (prime numbers? squares? cubes?). The last example of this chapter is an esoteric example of what you can do with a table of numbers to do some pretty (nerdy) fun stuff with a table of numbers, but for OLTP use, the goal will be (as we discussed in Chapter 5 on normalization) to pre-calculate values only when they are used often and can never change. Numbers-type tables are an excellent candidate for storing pre-calculated values because the set of integer numbers and prime numbers are the same now as back in 300 BC when Euclid was working with them.

In this section, I will present the following uses of the numbers table to get you going:

  • Determining the contents of a string : Looking through a string without looping by using the ordered nature of the numbers table to manage the iteration using relational code.
  • Determining gaps in a sequence : Having a set of data that contains all of the values allows you to use relational subtraction to find missing values.
  • Separating comma-delimited items : Sometimes data is not broken down into scalar values like you desire.
  • Stupid mathematic tricks: I take the numbers table to abstract levels, solving a fairly complex math problem that, while not terribly applicable in a practical manner, serves as an experiment to build upon if you have similar, complex problem-solving needs.

Determining the Contents of a String

As a fairly common example usage, it sometimes occurs that a value in a string you are dealing with is giving your code fits, but it isn’t easy to find what the issue is. If you want to look at the Unicode (or ASCII) value for every character in a string, you can do something like the following:

 DECLARE @string varchar(20) = 'Hello nurse!';

 SELECT Number.I as Position,

    SUBSTRING(split.value,Number.I,1) AS [Character],

    UNICODE(SUBSTRING(split.value,Number.I,1)) AS [Unicode]

 FROM Tools.Number

      CROSS JOIN (SELECT @string AS value) AS split

 WHERE Number.I > 0 --No zeroth position

      AND Number.I <= LEN(@string)

 ORDER BY Position;

This returns the following:

Position Character Unicode
------------- ------------- -------------
1 H 72
2 e 101
3 l 108
4 l 108
5 o 111
6 32
7 n 110
8 u 117
9 r 114
10 s 115
11 e 101
12 ! 33

This in and of itself is interesting, and sometimes when you execute this, you might see a little square character that can’t be displayed and a really large/odd Unicode value (like 20012, picking one randomly) that you didn’t expect in your database of English-only words. What really makes the technique awesome is that not only didn’t we have to write a routine to go column by column, we won’t have to do go row by row either. Using a simple join, you can easily do this for a large number of rows at once, this time joining to a table in the AdventureWorks2012 database that can provide us with an easy example set.

 SELECT LastName, Number.I AS position,

      SUBSTRING(Person.LastName,Number.I,1) AS [char],

      UNICODE(SUBSTRING(Person.LastName, Number.I,1)) AS [Unicode]

 FROM /*Adventureworks2012.*/ Person.Person

   JOIN Tools.Number

    ON Number.I <= LEN(Person.LastName )

    AND UNICODE(SUBSTRING(Person.LastName, Number.I,1)) IS NOT NULL

 ORDER BY LastName;

This returns 111,969 rows (one for each character in a last name) in only around 3 seconds on a virtual machine hosted on my writing laptop (which is a quite decent Alienware MX11 Core 2 Duo 1.3 GHz; 8 GB; 250 GB, 7200RPM SATA drive; 11.6-inch netbook-sized laptop).

LastName position char Unicode
----------------- ---------- ------------------ ------------------
Abbas 1 A 65
Abbas 2 b 98
Abbas 3 b 98
Abbas 4 a 97
Abbas 5 s 115
Abel 1 A 65
Abel 2 b 98
Abel 3 e 101
Abel 4 l 108
...... ... . ...

With that set, you could easily start eliminating known safe Unicode values with a simple where clause and find your evil outlier that is causing some issue with some process. For example, you could find all names that include a character not in the normal A–Z, space, comma, or dash characters.

 SELECT LastName, Number.I AS Position,

       SUBSTRING(Person.LastName,Number.I,1) AS [Char],

       UNICODE(SUBSTRING(Person.LastName, Number.I,1)) AS [Unicode]

 FROM /*Adventureworks2012.*/ Person.Person

     JOIN Tools.Number

       ON Number.I <= LEN(Person.LastName )

        AND UNICODE(SUBSTRING(Person.LastName, Number.I,1)) IS NOT NULL

 --Note I used both a-z and A-Z in LIKE in case of case sensitive AW database

 WHERE SUBSTRING(Person.LastName, Number.I,1) NOT LIKE '[a-zA-Z ∼''∼-]' ESCAPE '∼'

 ORDER BY LastName, Position;

This returns the following:

LastName position char Unicode
----------------- ---------- ------------------ ------------------
Mart¡nez 5 ¡ 161

This can be a remarkably powerful tool when trying to figure out what data is hurting your application with some unsupported text particularly when dealing with a stream of data from an outside source.

Finding Gaps in a Sequence of Numbers

Another common issue that we have when using a surrogate is that there can be gaps in their values. Ideally, this should not be an issue, but when troubleshooting errors it is often useful to be able to determine the missing numbers in a range. For example, say you have a table with a domain of values between 1 and 10. How might you determine if a value isn’t used? This is fairly simple; you can just do a distinct query on the used values and then check to see what values aren’t used, right? Well how about if you had to find missing values in 20,000+ distinct values? This is not quite going to work if a lot of values aren’t used. For example, consider the Person table in the AdventureWorks2012 database. Running the following query, you can see that not every BusinessEntityID is used.

 SELECT MIN(BusinessEntityID) AS MinValue, MAX(BusinessEntityID) AS MaxValue,

     MAX(BusinessEntityID) - MIN(BusinessEntityID) + 1 AS ExpectedNumberOfRows,

     COUNT(*) AS NumberOfRows,

     MAX(BusinessEntityID) - COUNT(*) AS MissingRows

 FROM /*Adventureworks2012.*/ Person.Person;

This returns the following:

MinValue MaxValue ExpectedNumberOfRows NumberOfRows MissingRows
----------------- ---------- ------------------ ------------------ ------------------
1 20777 20777 19972 805

So we know that there are 805 rows missing between BusinessEntityID values 1 and 20777. To discover these rows, we take a set of values from 1 to 20777 with no gaps, and subtract the rows using the EXCEPT relational operator:

 SELECT Number.I

 FROM Tools.Number

 WHERE I BETWEEN 1 AND 20777

 EXCEPT

 SELECT BusinessEntityID

 FROM /*Adventureworks2012.*/ Person.Person;

Execute this query and you will find that there are 805 rows returned. Using the subtraction method with the Numbers table is a very powerful method that you can use in lots of situations where you need to find what isn’t there rather than what is.

Separating Comma Delimited Items

My last example that you can translate to a direct business need comes from Erland Sommarskog’s web site ( www.sommarskog.se/ ) on arrays in SQL Server, as well as Aaron Bertrand’s old ASPFAQ web site. Using this code, you can take a comma-delimited list to return it as a table of values (which is the most desirable form for data in SQL Server in case you have just started reading this book on this very page and haven’t learned about normalization yet).

 DECLARE @delimitedList varchar(100) = '1,2,3'

 SELECT SUBSTRING(',' + @delimitedList + ',',I + 1,

       CHARINDEX(',',',' + @delimitedList + ',',I + 1) - I - 1) AS value

 FROM Tools.Number

 WHERE I >= 1

    AND I < LEN(',' + @delimitedList + ',') - 1

    AND SUBSTRING(',' + @delimitedList + ',', I, 1) = ','

 ORDER BY I;

This returns the following:

 Value

 ------

 1

 2

 3

The way this code works is pretty interesting in and of itself. It works by doing a substring on each row. The key is in the WHERE clause.

 WHERE I >= 1

    AND I < LEN(',' + @delimitedList + ',') - 1

    AND SUBSTRING(',' + @delimitedList + ',', i, 1) = ','

The first line is there because SUBSTRING starts with position 1. The second limits the rows in Tools.Number to more than the length of the @delimitedList variable. The third includes rows only where the SUBSTRING of the value at the position returns the delimiter, in this case, a comma. So, take the following query:

 DECLARE @delimitedList varchar(100) = '1,2,3';

 SELECT I

 FROM Tools.Number

 WHERE I >= 1

    AND I < LEN(',' + @delimitedList + ',') - 1

    AND SUBSTRING(',' + @delimitedList + ',', I, 1) = ','

 ORDER BY I;

Executing this, you will see the following results, showing you the position of each value in the list:

 Value

 -----

 1

 3

 5

Since the list has a comma added to the beginning and end of it in the query, you will see that that the positions of the commas are represented in the list. The SUBSTRING in the SELECT clause of the main query simply fetches all of the @delimitedList value up to the next comma. This sort of use of the sequence table will allow you to do what at first seems like it would require a massive, iterating algorithm in order to touch each position in the string individually (which would be slow in T-SQL, though you might get away with it in the CLR) and does it all at once in a set-based manner that is actually very fast.

Finally, I have expanded on this technique to allow you to do this for every row in a table that needs it by joining the Tools.Numbers table and joining on the values between 1 and the length of the string (and delimiters). The best use for this code is to normalize a set of data where some programmer thought it was a good idea to store data in a comma-delimited list (it rarely is) so that you can use proper relational techniques to work with this data. It is actually pretty sad how often you may find this query useful.

 CREATE TABLE dbo.poorDesign

 (

    poorDesignId int,

    badValue varchar(20)

 );

 INSERT INTO dbo.poorDesign

 VALUES (1,'1,3,56,7,3,6'),

        (2,'22,3'),

        (3,'1'),

The code just takes the stuff in the WHERE clause of the previous query and moves it into JOIN criteria.

 SELECT poorDesign.poorDesignId AS betterDesignId,

    SUBSTRING(',' + poorDesign.badValue + ',',I + 1,

    CHARINDEX(',',',' + poorDesign.badValue + ',', I + 1) - I - 1)

       AS betterScalarValue

 FROM dbo.poorDesign

    JOIN Tools.Number

    ON I >= 1

    AND I < LEN(',' + poorDesign.badValue + ',') - 1

    AND SUBSTRING(',' + + poorDesign.badValue + ',', I, 1) = ',';

This returns the following:

betterDesignId betterScalarValue
----------------- -----------------
1 1
1 3
1 56
1 7
1 3
1 6
2 22
2 3
3 1

Ah…that’s much better. Each row of the output represents only a single value, not an array of values. As I have said many times throughout the book, SQL works great with atomic values, but try to get individual values out of a single column, and you get ugly code like I have just presented. It is an excellent solution for the problem; in fact, it is the fault of the problem that makes it ugly. Now just create the table using this better design of having one row per scalar value, insert the data, and drop the bad designed table. I won’t create the better design, but we do need to clean up the poorDesign table with the following, lest someone stumbles upon it and uses it as a good idea:

 DROP TABLE dbo.poorDesign;

Stupid Mathematic Tricks

I want to give you a final, (hopefully) entertaining, and esoteric usage of the sequence table to get your mind working on the possibilities. One of my favorite episodes of Futurama is an episode called “Lesser of Two Evils.” In this episode, Bender and the Bender look-alike named Flexo start talking and have the following exchange: Bender look-alike named Flexo start talking and have the following exchange (they are both Bender units…Did someone call for a nerd?):

Bender: Hey, brobot, what’s your serial number?

Flexo: 3370318.

Bender: No way! Mine’s 2716057!

Fry (a human): I don’t get it.

Bender: We’re both expressible as the sum of two cubes!

So, I figured, the sum of two cubes would be an interesting and pretty easy abstract utilization of the numbers table. “Taxicab ” numbers are also mentioned on the ScienceNews.org web site, 1 where the goal is to discover the smallest value that can be expressed as the sum of three cubes in N different ways. They are called taxicab numbers because of an old (and pretty darn nerdy) story in which one mathematician remarked to another mathematician that the number 1729 on a taxicab was “dull,” to which the other one remarked that it was very interesting, because it was the smallest number that could be expressed as the sum of two cubes. (You can judge your own nerdiness by whether you think: A. This is stupid; B. This is cool; or C. You have done it yourself.)

How hard is the query? It turns out that once you have a sequence table with numbers from 1 to 100,000 or so, you can calculate that Taxicab(2) = 1729 very easily (and all of the other numbers that are the sum of two cubes too) and the sum of two cubes in three different ways also pretty easily, It took three seconds on my laptop, and that value is 87539319.

But, instead of calculating the value of each integer cubed (power(i,3)) for each iteration, you can add a computed column to the table, this time as a bigint to give the later calculations room to store the very large intermediate values when you start to multiply the two cube values together. You can do something like the following:

 ALTER TABLE Tools.Number

    ADD Ipower3 AS CAST(POWER(CAST(I AS bigint),3) AS bigint) PERSISTED;

    --Note that I had to cast I as bigint first to let the power function

    --return a bigint

Now, here is the code:

 DECLARE @level int = 2; --sum of two cubes in @level ways

 ;WITH cubes AS

 (SELECT Ipower3

 FROM Tools.Number

 WHERE I >= 1 AND I < 500) --<<<Vary for performance, and for cheating reasons,

                           --<<<max needed value

 SELECT c1.Ipower3 + c2.Ipower3 AS [sum of 2 cubes in @level Ways]

 FROM cubes AS c1

        CROSS JOIN cubes AS c2

 WHERE c1.Ipower3 <= c2.Ipower3 --this gets rid of the "duplicate" value pairs

 GROUP BY (c1.Ipower3 + c2.Ipower3)

 HAVING count(*) = @level

 ORDER BY [sum of 2 cubes in @level Ways];

This will return 559 rows in just a second or two. The first row is 1729, which is the smallest number that is the sum of two cubes in two different ways. OK, breaking this down the cube’s CTE, the code is pretty simple.

 (SELECT Ipower3

 FROM    Tools.Number

 WHERE   I >= 1 AND I < 500)

This limits the values to a table of cubes, but only the first 499. The next part of the query is a bit more interesting. I sum the two cube values, which I get from cross-joining the CTE to itself.

 SELECT c1.Ipower3 + c2.Ipower3 AS [sum of 2 cubes in @level Ways]

 FROM   cubes AS c1

          CROSS JOIN cubes AS c2

 WHERE c1.Ipower3 <= c2.Ipower3 --this gets rid of the "duplicate" value pairs

The WHERE condition of c1.i3 <= c2.i3 gets rid of the “duplicate” value pairs since c1 and c2 have the same values, so without this, for 1729 you would get the following:

c1.i3 c2.i3
----------------- -----------------
1 1728
729 1000
1000 729
1728 1

These pairs are the same, technically, just in reverses. I don’t eliminate equality to allow for the case where both numbers are equal, because they won’t be doubled up in this set and if the two cubes were of the same number, it would still be interesting if it were the same as a different sum of two cubes. With the following values:

c1.i3 c2.i3
----------------- -----------------
1 1728
729 1000

Now you can see that 1729 is the sum of two cubes in two different ways. So, lastly, the question of performance must come up. Reading the articles, it is clear that this is not a terribly easy problem to solve as the values get really large. Values for the sum of three cubes are fairly simple. Leaving the sequence values bounded at 500.

 DECLARE @level int = 3; --sum of two cubes in @level ways

 ;WITH cubes AS

 (SELECT Ipower3

 FROM Tools.Number

 WHERE I >= 1 AND I < 500) --<<< Vary for performance, and for cheating reasons,

                         --<<< max needed value

 SELECT c1.Ipower3 + c2.Ipower3 AS [sum of 2 cubes in @level Ways]

 FROM cubes AS c1

        CROSS JOIN cubes AS c2

 WHERE c1.Ipower3 < c2.Ipower3

 GROUP BY (c1.Ipower3 + c2.Ipower3)

 HAVING count(*) = @level

 ORDER BY [sum of 2 cubes in @level Ways];

This returns in around a second:

sum of 2 cubes in @level Ways
-----------------
87539319 119824488

Four, however, was a “bit” more challenging. Knowing the answer from the article, I knew I could set a boundary for my numbers using 20000 and get the answer.

 DECLARE @level int = 4 --sum of two cubes in @level ways

 ;WITH cubes AS

 (SELECT Ipower3

 FROM Tools.Number

 WHERE I >= 1 and I < 20000) --<<<Vary for performance, and for cheating reasons,

                             --<<<max needed value

 SELECT c1.Ipower3 + c2.Ipower3 AS [sum of 2 cubes in @level Ways]

 FROM cubes AS c1

       CROSS JOIN cubes AS c2

 WHERE c1.Ipower3 < c2.Ipower3

 GROUP BY (c1.Ipower3 + c2.Ipower3)

 HAVING count(*) = @level

 ORDER BY [sum of 2 cubes in @level Ways];

Using this “cheat” of knowing how to tweak the number you need to on my laptop’s VM, I was able to calculate that the minimum value of taxicab(4) was 6963472309248 (yes, it found only the one) in just 2 hours and 42 seconds on the same VM and laptop mentioned earlier. Clearly, the main value of T-SQL isn’t in tricks like this but that using a sequence table can give you the immediate jumping-off point to solve some problems that initially seem difficult.

image  Caution    Be careful where you try this code for very large values of @level. A primary limiting factor is tempdb space and you don’t want to blow up the tempdb on your production server only to have to explain this query to your manager. Trust me.

Calendar Table

It is a common task for a person to want to know how to do groupings and calculations with date values. For example, you might want sales grouped by month, week, year, or any other grouping. You can usually do this using the SQL Server date functions, but often it is costly in performance, and it is always a fairly non-obvious operation. What can truly help with this process is to use a table filled with date values, commonly called a calendar table. Using a calendar table is commonplace in Business Intelligence/OLAP implementations (something that is covered more in Chapter 14), but it certainly can be useful in OLTP databases when you get stuck doing a confusing date range query.

Using the same form of precalculated logic that we applied to the numbers table, we can create a table that contains one row per date. I will set the date as the primary key and then have data related to the date as columns. The following is the basic date table that I currently use. You can extend it as you want to include working days, holidays, special events, and so on, to filter/group by in the same manner as you do with these columns, along with the others I will add later in the section (again, I am working in a copy of AdventureWorks2012, bolting on my tools to make life easier).

 CREATE TABLE Tools.Calendar

 (

    DateValue date NOT NULL CONSTRAINT PKtools_calendar PRIMARY KEY,

    DayName varchar(10) NOT NULL,

    MonthName varchar(10) NOT NULL,

    Year varchar(60) NOT NULL,

    Day tinyint NOT NULL,

    DayOfTheYear smallint NOT NULL,

    Month smallint NOT NULL,

    Quarter tinyint NOT NULL

 );

image  Note    I wanted to make several of these columns into persisted computed columns, but it turns out that the datename and datepart functions we will use to load the data were not deterministic functions due to how they have to work with regionalization, so we will store the values manually. In SQL Server 2012, there is an additional way to format date data using FORMAT (also non-deterministic), which allows for specification of regionalization information. I didn’t change my example to use FORMAT as DATEPART and DATENAME are perfectly valid and a bit more readable methods of extracting date information from a date value, but FORMAT would allow you to reference multiple cultures in a single statement if you needed it.

The question of normalization might come up again at this point, and it is a valid question. Since these values we have created can be calculated (and quite easily in most case), why do this? Isn’t this denormalization? There are a few ways to look at it, but I would generally say that it isn’t. Each row represents a day in time, and each of the columns is functionally dependent on the date value. The fact is, the functions have to look up or calculate the date attributes in some manner, so you are actually saving that lookup cost, however minimal it might be.) When you have to filter on date criteria, the benefit of the normalized design can be quite obvious. Later in the section we will extend the table to add more interesting, company-based values, which certainly are normalized in the context of a date.

Note that the name calendar is not truly consistent with our names representing a single row, at least not in a natural way, but the other names that we might use (date, dateValue, etc.) all sound either forced or hokey. Since the table represents a calendar and a calendar item generally would logically represent a day, I went with it. A stretch perhaps but naming is pretty difficult at times, especially when working in a namespace that the real world and SQL Server have so many established names.

The next step is to load the table with values, which is pretty much a straightforward task using the Numbers table that we just finished creating in the previous section. Using the datename and datepart functions and a few simple case expressions, you load the different values. I will make use of many of the functions in the examples, but most are very easy to understand.

 WITH dates (newDateValue) AS (

    SELECT DATEADD(day,I,'17530101') AS newDateValue

    FROM Tools.Number

 )

 INSERT Tools.Calendar

    (DateValue ,DayName

    ,MonthName ,Year ,Day

    ,DayOfTheYear ,Month ,Quarter

 )

 SELECT

    dates.newDateValue as DateValue,

    DATENAME (dw,dates.newDateValue) As DayName,

    DATENAME (mm,dates.newDateValue) AS MonthName,

    DATENAME (yy,dates.newDateValue) AS Year,

    DATEPART(day,dates.newDateValue) AS Day,

    DATEPART(dy,dates.newDateValue) AS DayOfTheYear,

    DATEPART(m,dates.newDateValue) AS Month,

    DATEPART(qq,dates.newDateValue) AS Quarter

 FROM dates

 WHERE dates.newDateValue BETWEEN '20000101' AND '20130101' --set the date range

 ORDER BY DateValue;

Just like the numbers table, there are several commonly useful ways to use the calendar table. The first I will demonstrate is general grouping types of queries, and the second is calculating ranges. As an example of grouping, say you want to know how many sales had been made during each year in Sales.SalesOrderHeader in the AdventureWorks2012 database. This is why there is a year column in the table:

 SELECT Calendar.Year, COUNT(*) as OrderCount

 FROM /*Adventureworks2012.*/ Sales.SalesOrderHeader

    JOIN Tools.Calendar

    --note, the cast here could be a real performance killer

    --consider using date columns where possible

    ON CAST(SalesOrderHeader.OrderDate as date) = Calendar.DateValue

 GROUP BY Calendar.Year

 ORDER BY Calendar.Year;

This returns the following:

Year OrderCount
----------------- -----------------
2005 1379
2006 3692
2007 12443
2008 13951

The beauty of the calendar table is that you can easily group values that are not that simple to compute in a really obvious manner them, all while using natural relational coding style/technique. For example: to count the sales on Tuesdays and Thursdays?

 SELECT Calendar.DayName, COUNT(*) as OrderCount

 FROM /*Adventureworks2012.*/ Sales.SalesOrderHeader

    JOIN Tools.Calendar

          --note, the cast here could be a real performance killer

          --consider using date columns where

    ON CAST(SalesOrderHeader.OrderDate as date) = Calendar.DateValue

 WHERE Calendar.DayName in ('Tuesday','Thursday')

 GROUP BY Calendar.DayName

 ORDER BY Calendar.DayName;

This returns the following:

DayName OrderCount
----------------- -----------------
Thursday 4875
Tuesday 4483

image  Tip In many tables where only the date is used in computations like this, I will add a computed column that has the date only. For example, in the SalesOrderHeader table in AdventureWorks2012, I would have named OrderDate OrderTime and added a computed column named OrderDate defined as CAST(SalesOrderHeader.OrderDate as date) that could have statistics and be indexed for faster operation.

OK, I see you are possibly still skeptical. What if I throw in the first Tuesday after the second Wednesday? How? Well, it is really a simple matter of building the set step by step using CTEs to build the set that you need.

 ;WITH onlyWednesdays AS --get all Wednesdays

 (

    SELECT *,

      ROW_NUMBER() OVER (PARTITION BY Calendar.Year, Calendar.Month

              ORDER BY Calendar.Day) AS wedRowNbr

    FROM Tools.Calendar

    WHERE DayName = 'Wednesday'

 ),

 secondWednesdays AS --limit to second Wednesdays of the month

 (

    SELECT *

    FROM onlyWednesdays

    WHERE wedRowNbr = 2

 )

 ,finallyTuesdays AS --finally limit to the Tuesdays after the second wed

 (

    SELECT Calendar.*,

    ROW_NUMBER() OVER (PARTITION BY Calendar.Year, Calendar.Month

           ORDER by Calendar.Day) AS rowNbr

    FROM secondWednesdays

        JOIN Tools.Calendar

        ON secondWednesdays.Year = Calendar.Year

        AND secondWednesdays.Month = Calendar.Month

    WHERE Calendar.DayName = 'Tuesday'

    AND Calendar.Day > secondWednesdays.Day

 )

 --and in the final query, just get the one month

 SELECT Year, MonthName, Day

 FROM finallyTuesdays

 WHERE Year = 2012

     AND rowNbr = 1;

This returns the following set of values that appear to be the result of some advanced calculations, but rather are the results of a fairly simple query (to anyone in Chapter 12 of this book, for sure!):

Year MonthName Day
---- --------- ----
2012 January 17
2012 February 14
2012 March 20
2012 April 17
2012 May 15
2012 June 19
2012 July 17
2012 August 14
2012 September 18
2012 October 16
2012 November 20
2012 December 18

Now, utilizing another CTE, you could use this to join to the Sales.SalesOrderHeader table and find out sales on the first Tuesday after the second Wednesday. And that is exactly what I would do if I was being fed requirements by a madman (or perhaps they prefer the term is marketing analyst?) and would be done in no time. However, if this date became a corporate standard desired date, , I would add a column to the calendar table (maybe called FirstTuesdayAfterSecondWednesdayFlag) and set it to 1 for every date that it matches (and in the one month where that day was a holiday and they wanted it to be the Thursday after the second Wednesday after…well, the change would be a simple update.

The realistic application of this is a company sale that they routinely start on a given day every month, so the report needs to know how sales were for four days after. So, perhaps the column would be BigSaleDaysFlag , or whatever works. The goal is the same as back in Chapter 7 when we talked about data driven design. Store data in columns; rules in code. A new type of day is just a new column and calculation that every user now has access to and doesn’t have to know the calculation that was used to create the value.

In most standard calendar table utilizations, it will be common to have, at a minimum, the following generic events/time ranges:

    --Saturday or Sunday set to 1, else 0

    WeekendFlag bit NOT NULL,

    FiscalYear smallint NOT NULL,

    FiscalMonth tinyint NULL,

    FiscalQuarter tinyint NOT NULL

Almost every company has a fiscal year that it uses for its business calendar. This technique allows you to treat the fiscal time periods more or less like regular calendar dates in your code, with no modification at all. I will demonstrate this after we load the data in just a few paragraphs.

This covers the basics; now I’ll discuss one last thing you can add to the calendar table to solve the problem that really shifts this into “must-have” territory: floating windows of time.

image  Note    This is actually one of the few techniques that I created on my own. I am not actually claiming that I am the only person to ever do this, but I have not read about it anywhere else, and I built this myself when trying to solve a particular type of problem using the system functions to no avail.

The goal for the floating windows of time was to add a relative positioning value to the table so that you could easily get N time periods from a point in time. Years are already contiguous, increasing numbers, so it is easy to do math with them. But it is not particularly comfortable to do math with months. I found myself often having to get the past 12 months of activity, sometimes including the current month and sometimes not. Doing math that wraps around a 12-month calendar was a pain, so to the calendar, I add the following columns that I will load with increasing values with no gaps:

 RelativeDayCount int NOT NULL,

 RelativeWeekCount int NOT NULL,

 RelativeMonthCount int NOT NULL

Using these columns I will store sequence numbers that start at an arbitrary point in time. I will use '20000101' here, but it is really unimportant, and negative values are not a problem either. You should never refer to the value itself, just the value’s relative position to some point in time you choose. And days are numbered from that point (negative before, positive before), months, and again weeks. This returns the following “final” calendar table:

 DROP TABLE Tools.Calendar

 GO

 CREATE TABLE Tools.Calendar

 (

      DateValue date NOT NULL CONSTRAINT PKtools_calendar PRIMARY KEY,

      DayName varchar(10) NOT NULL,

      MonthName varchar(10) NOT NULL,

      Year varchar(60) NOT NULL,

      Day tinyint NOT NULL,

      DayOfTheYear smallint NOT NULL,

      Month smallint NOT NULL,

      Quarter tinyint NOT NULL,

      WeekendFlag bit NOT NULL,

      --start of fiscal year configurable in the load process, currently

      --only supports fiscal months that match the calendar months.

      FiscalYear smallint NOT NULL,

      FiscalMonth tinyint NULL,

      FiscalQuarter tinyint NOT NULL,

      --used to give relative positioning, such as the previous 10 months

      --which can be annoying due to month boundaries

      RelativeDayCount int NOT NULL,

      RelativeWeekCount int NOT NULL,

      RelativeMonthCount int NOT NULL

 )

Last, I will reload the table with the following code:

 ;WITH dates (newDateValue) AS (

      SELECT DATEADD(day,I,'17530101') AS newDateValue

      FROM Tools.Number

 )

 INSERT Tools.Calendar

      (DateValue ,DayName

      ,MonthName ,Year ,Day

      ,DayOfTheYear ,Month ,Quarter

      ,WeekendFlag ,FiscalYear ,FiscalMonth

      ,FiscalQuarter ,RelativeDayCount,RelativeWeekCount

      ,RelativeMonthCount)

 SELECT

      dates.newDateValue AS DateValue,

      DATENAME (dw,dates.newDateValue) AS DayName,

      DATENAME (mm,dates.newDateValue) AS MonthName,

      DATENAME (yy,dates.newDateValue) AS Year,

      DATEPART(day,dates.newDateValue) AS Day,

      DATEPART(dy,dates.newDateValue) AS DayOfTheYear,

      DATEPART(m,dates.newDateValue) AS Month,

      CASE

        WHEN MONTH( dates.newDateValue) <= 3 THEN 1

        WHEN MONTH( dates.newDateValue) <= 6 THEN 2

        When MONTH( dates.newDateValue) <= 9 THEN 3

      ELSE 4 END AS quarter,

      CASE WHEN DATENAME (dw,dates.newDateValue) IN ('Saturday','Sunday')

        THEN 1

        ELSE 0

      END AS weekendFlag,

      ------------------------------------------------

      --the next three blocks assume a fiscal year starting in July.

      --change if your fiscal periods are different

      ------------------------------------------------

      CASE

        WHEN MONTH(dates.newDateValue) <= 6

        THEN YEAR(dates.newDateValue)

        ELSE YEAR (dates.newDateValue) + 1

      END AS fiscalYear,

      CASE

        WHEN MONTH(dates.newDateValue) <= 6

        THEN MONTH(dates.newDateValue) + 6

        ELSE MONTH(dates.newDateValue) - 6

      END AS fiscalMonth,

      CASE

        WHEN MONTH(dates.newDateValue) <= 3 then 3

        WHEN MONTH(dates.newDateValue) <= 6 then 4

        WHEN MONTH(dates.newDateValue) <= 9 then 1

      ELSE 2 END AS fiscalQuarter,

      ------------------------------------------------

      --end of fiscal quarter = july

      ------------------------------------------------

      --these values can be anything, as long as they

      --provide contiguous values on year, month, and week boundaries

      DATEDIFF(day,'20000101',dates.newDateValue) AS RelativeDayCount,

      DATEDIFF(week,'20000101',dates.newDateValue) AS RelativeWeekCount,

      DATEDIFF(month,'20000101',dates.newDateValue) AS RelativeMonthCount

 FROM dates

 WHERE dates.newDateValue BETWEEN '20000101' AND '20130101'; --set the date range

Now we can build a query to get only weekends, grouped by fiscalYear as follows:

 SELECT Calendar.FiscalYear, COUNT(*) AS OrderCount

 FROM /*Adventureworks2012.*/ Sales.SalesOrderHeader

      JOIN Tools.Calendar

        --note, the cast here could be a real performance killer

        --consider using a persisted calculated column here

        ON CAST(SalesOrderHeader.OrderDate as date) = Calendar.DateValue

 WHERE WeekendFlag = 1

 GROUP BY Calendar.FiscalYear

 ORDER BY Calendar.FiscalYear;

This returns the following:

 FiscalYear      OrderCount

 ----------      ----------

 2006            734

 2007            1114

 2008            6855

 2009            234

To demonstrate the floating windows of time using the Relative_____Count columns, consider that you want to count the sales for the previous two weeks. It’s not impossible to do this using the date functions perhaps, but it’s simple to do with a calendar table:

 DECLARE @interestingDate date = '20120509';

 SELECT Calendar.DateValue as PreviousTwoWeeks, CurrentDate.DateValue AS Today,

      Calendar.RelativeWeekCount

 FROM Tools.Calendar

      JOIN (SELECT *

          FROM Tools.Calendar

          WHERE DateValue = @interestingDate) AS CurrentDate

      ON Calendar.RelativeWeekCount < (CurrentDate.RelativeWeekCount)

        and Calendar.RelativeWeekCount >=

          (CurrentDate.RelativeWeekCount -2);

This returns the following:

PreviousTwoWeeks Today RelativeWeekCount
----------------- ---------- ------------------
2012-04-22 2012-05-09 643
2012-04-23 2012-05-09 643
2012-04-24 2012-05-09 643
2012-04-25 2012-05-09 643
2012-04-26 2012-05-09 643
2012-04-27 2012-05-09 643
2012-04-28 2012-05-09 643
2012-04-29 2012-05-09 644
2012-04-30 2012-05-09 644
2012-04-01 2012-05-09 644
2012-04-02 2012-05-09 644
2012-04-03 2012-05-09 644
2012-04-04 2012-05-09 644
2012-04-05 2012-05-09 644

From this, we can see that the previous two weeks start on Sunday (04/22/2012 and 04/29/2012) and end on Saturday (04/28/2012 and 05/05/2012). The dates 05/06/2012 to 05/09/2012 are not included because that is this week (relative to the variable value). The basics of the query is simply to take a derived table that fetches the calendar row for the “interesting” date and then join that to the full calendar table using a range of dates in the join. In the previous week’s example, I used the following:

 Calendar.RelativeWeekCount < (CurrentDate.RelativeWeekCount)

      AND Calendar.RelativeWeekCount >= (CurrentDate.RelativeWeekCount -2)

This was because I wanted the weeks that are previous to the current week and weeks two weeks back. Weeks aren’t the sweet spot of this technique exactly, because weeks are of fixed length (but they are easier to get a full result set in print, since I don’t get paid by the page). Now, join the values to your sales table, and you can see sales for the past two weeks. Want to see it broken down by week? Use relative week count. Use the following if you wanted the previous 12 months:

 DECLARE @interestingDate date = '20120509'

 SELECT MIN(Calendar.DateValue) AS MinDate, MAX(Calendar.DateValue) AS MaxDate

 FROM Tools.Calendar

     JOIN (SELECT *

       FROM Tools.Calendar

       WHERE DateValue = @interestingDate) as CurrentDate

   ON Calendar.RelativeMonthCount < (CurrentDate.RelativeMonthCount)

     AND Calendar.RelativeMonthCount >=

       (CurrentDate.RelativeMonthCount -12);

This returns the following:

MinDate MaxDate
----------------- -----------------
2011-05-01 2012-04-30

This query includes all earlier months, but not the one loaded into @interestingDate. Notice that the results did not include the current month. Change the < to <=, and it will include the current month. If you want to get a 24-month window, get the 12 months plus or minus the current month, as follows:

 ON Calendar.RelativeMonthCount < (CurrentDate.RelativeMonthCount + 12)

   AND Calendar.RelativeMonthCount >=

     (CurrentDate.RelativeMonthCount -12)

Using that ON clause results in the following:

MinDate MaxDate
----------------- -----------------
2007-03-01 2009-02-28

Now you can use these dates in other criteria, either by assigning these values to a variable or (if you are one of the cool kids) by using the tables in a join to other tables. As a real example, let’s hop in the Wabac machine and look at sales in the Adventureworks2012 database around September 27, 2008. (Keep in mind that the sample databases could change in the future to get more up-to-date data, but the 2012 RTM version had dates that were the same as the 2008 and the 2008 R2 editions of AdventureWorks.)

 DECLARE @interestingDate date = '20080927';

 SELECT Calendar.Year, Calendar.Month, COUNT(*) AS OrderCount

 FROM /*Adventureworks2012.*/ Sales.SalesOrderHeader

     JOIN Tools.Calendar

      JOIN (SELECT *

        FROM Tools.Calendar

        WHERE DateValue = @interestingDate) as CurrentDate

          ON Calendar.RelativeMonthCount <=

          (CurrentDate.RelativeMonthCount )

        AND Calendar.RelativeMonthCount >=

          (CurrentDate.RelativeMonthCount -10)

     ON cast(SalesOrderHeader.ShipDate as date)= Calendar.DateValue

 GROUP BY Calendar.Year, Calendar.Month

 ORDER BY Calendar.Year, Calendar.Month;

This query will give you items that shipped in the previous ten months and in August, and group them by month, as follows:

Year Month OrderCount
----------------- ----------------- -----------------
2007 11 1825
2007 12 2228
2008 1 1998
2008 2 2034
2008 3 2100
2008 4 2058
2008 5 2395
2008 6 2360
2008 7 1271
2008 8 211

I included the current month by changing the first condition to <=. So, you should be able to see that the calendar table and sequence table are two excellent tables to add to almost any database. They give you the ability to take a functional problem like getting the last day of the month or the previous time periods and turn it into a relational question that SQL Server can chew up and spit out using the relational techniques it is built for.

Keep in mind too that creating more fiscal calendars, reporting calendars, corporate holiday calendars, sales calendars, and so forth, is easily done by adding more columns to this very same calendar table. For instance, you may have multiple business units, each with their own fiscal calendar. With the one normal calendar, you can add any variation of the fiscal calendar multiple times. It’s easy to just add those columns, since they all relate back to a normal calendar date.

One last thing to note is that the calendar table needs to be maintained. The insert statement we wrote earlier can be changed slightly so that it can be run to extend the number of rows as time passes. Perhaps an Agent job ran on the first day of the year to add another set of rows, or I have also used a process to keep the calendar just a few weeks in the future for a payroll system. The rules changed on occasion, so the new rows were added just a few weeks in advance based on whatever rules were then in effect.

Utility Objects

As you acquire more and more experience over the years (which is loosely synonymous with the many failures you will encounter that teach you lessons), you will undoubtedly end up with a toolbox full of various tools that you find you use quite frequently. Some of these tools will be used in an ad-hoc manner when you are dealing with the same problems over and over again. I personally like to simply keep a set of files/projects stored on my dropbox/skydrive folders for easy accessibility when I am working, writing, or just poking around at SQL Server to see what I can see at the office, home, or even at a client site.

Beyond the type of utilities that are used infrequently, the toolbox will contain a rather large set of tools that become part of the regular processing of the system. For these types of tools the logistics of their use can pose a problem. How do you make sure that they are usable by every database, as needed, and allow for the differences that will occur in every system (and keep your databases portable so that you can move them to a new server with the least amount of pain? In the old days of the twentieth century, we would make a system object in the master database that was prefixed with sp_ (and marked it as a system object) and diligently maintain all servers as equal as possible While this is still possible, we found that too often if we wanted to include a change to the object, we couldn’t do it because with N systems using an object, we usually found N different ways an object was used. Then you would have abnormal procedures spread around the enterprise with names that included a database that might be on a single server (and over time, didn’t even exist anymore). As time passed, we stopped using master and started adding a utility database which was managed on every server. Ideally they were identical, but over time, the DBA and or Developers used the database for “temp” usages that again becomes unwieldy and decidedly not temporary.

What seems to be the best answer is to create a schema on each database that contains the tools that that database actually uses. By putting the objects in each database individually, it is easier to decide whether additions to an object should be a new object that is specific to the single database or if it was simply a new version of an existing object. If it is just a new version, the systems that were using an object would take the new change as an upgrade after due testing. In either case, each database becomes more stand-alone so that moving from server A to server B means a lot less preparation on server B trying to meet the needs of server B. A few SQL Agent jobs may need to be created to execute.

For examples, I chose a few of the types of utilities that I find useful in my databases I design. I will present one example in each, though this is only just a start. We will look at solutions in:

  • Monitoring tools : Tools to watch what is going on with the database such as row counts, file sizes, and so forth.
  • Extended DDL utilities : Tools used to make changes to the structure of the database, usually to remove keys or indexes for load processes, usually to do multiple DDL calls where SQL Server’s DDL would require multiple calls.

Monitoring Objects

Keeping an eye on the database usage is a very common need for the database administrator. A very typical question from upper management is to find out how much a system is used, and particularly how much the database has grown over time. With a little bit of planning it is easy to be prepared for these questions and others by doing a bit of monitoring.

In the example, I will build a table that will capture the row counts from all of the tables in the database (other than the sys schema), but running a stored procedure. It is set up to use the sys.partitions catalog view, because it gives a “good enough” count of the rows in all of the tables. If it is important to get extremely precise rowcounts, you could use a cursor and do a select COUNT(*) from each table in the database as well.

For monitoring data, I will create a Monitor schema. This I will not give rights to anyone by default, as the monitor schema will generally be for the DBA to get a feel for system growth.

 CREATE SCHEMA Monitor;

Next I will create a table, with the grain of the data being at a daily level. The procedure will be created to allow you to capture rowcounts more than once a day if needed. Note too that the ObjectType column can have more than just tables, since it might be interesting to see if indexed views are also growing in size as well. I will include only clustered or heap structures so that we get only the table structures.

 CREATE TABLE Monitor.TableRowCount

 (

   SchemaName sysname NOT NULL,

   TableName sysname NOT NULL,

   CaptureDate date NOT NULL,

   Rows integer NOT NULL,

   ObjectType sysname NOT NULL,

   Constraint PKMonitor_TableRowCount PRIMARY KEY (SchemaName, TableName, CaptureDate)

 );

Then the following procedure will be scheduled to run daily:

 CREATE PROCEDURE Monitor.TableRowCount$captureRowcounts

 AS

 -- ----------------------------------------------------------------

 -- Monitor the row counts of all tables in the database on a daily basis

 -- Error handling not included for example clarity

 --

 -- NOTE: This code expects the Monitor.TableRowCount to be in the same db as the

 -- tables being monitored. Rework would be needed if this is not a possibility

 --

 -- 2012 Louis Davidson – [email protected] &#x2013; drsql.org

 -- ----------------------------------------------------------------

 -- The CTE is used to set upthe set of rows to put into the Monitor.TableRowCount table

 WITH CurrentRowcount AS (

 SELECTOBJECT_SCHEMA_NAME(partitions.object_id) AS SchemaName,

 OBJECT_NAME(partitions.object_id) AS TableName,

 CAST(getdate() AS date) AS CaptureDate,

 SUM(rows) AS Rows,

 objects.type_desc AS ObjectType

 FROMsys.partitions

 JOIN sys.objects

 ON partitions.object_id = objects.object_id

 WHEREindex_id in (0,1) --Heap 0 or Clustered 1 “indexes”

 ANDobject_schema_name(partitions.object_id) NOT IN ('sys')

 --the GROUP BY handles partitioned tables with > 1 partition

 GROUP BY partitions.object_id, objects.type_desc)

 --MERGE allows this procedure to be run > 1 a day without concern, it will update if the row

 --for the day exists

 MERGEMonitor.TableRowCount

 USING(SELECT SchemaName, TableName, CaptureDate, Rows, ObjectType

 FROM CurrentRowcount) AS Source

 ON (Source.SchemaName = TableRowCount.SchemaName

   AND Source.TableName = TableRowCount.TableName

   AND Source.CaptureDate = TableRowCount.CaptureDate)

 WHEN MATCHED THEN

 UPDATE SET Rows = Source.Rows

 WHEN NOT MATCHED THEN

 INSERT (SchemaName, TableName, CaptureDate, Rows, ObjectType)

 VALUES (Source.SchemaName, Source.TableName, Source.CaptureDate,

 Source.Rows, Source.ObjectType);

 GO

Now, you execute the following procedure and check the results for the HumanResources schema in AdventureWorks2012, where we have been working:

 EXEC Monitor.TableRowCount$captureRowcounts;

 SELECT *

 FROM Monitor.TableRowCount

 WHERE SchemaName = 'HumanResources'

 ORDER BY SchemaName, TableName;

Then (assuming you are still in the pristine version of the AddIn database we are working on), the output of this batch will be as follows:

SchemaName TableName CaptureDate Rows ObjectType
----------------- ----------------- ----------------- ----------------- -----------------
HumanResources Department 2012-02-25 16 USER_TABLE
HumanResources Employee 2012-02-25 290 USER_TABLE
HumanResources EmployeeDepartmentHistory 2012-02-25 296 USER_TABLE
HumanResources EmployeePayHistory 2012-02-25 316 USER_TABLE
HumanResources JobCandidate 2012-02-25 13 USER_TABLE
HumanResources Shift 2012-02-25 3 USER_TABLE

If you look at all of the rows in the table for the Monitor schema, you will see that they were 0, since it was checked before we added these rows. Run it again and you will notice that there is an increase of rows in the Monitor.TableRowCount table, notably the rows we just added. I tend to capture the rowcount of all tables in the Monitor and Tools database as well. In many cases, I will then add a procedure to check for abnormal growth of rows in a table. For example, if the calendar table changes rows (up or down), there could easily be an issue, since this table will generally grow once, at the end of the year. You might also write a query to make sure that the monitoring table rows are increasing and alert the admin if not.

Extended DDL Utilities

In a high number of the systems I work with, data is constantly being moved around, sometimes a very large amount. I almost always make sure that there are relationships, check constraints, unique constraints, etc. This is a way to make sure that the data that is loaded meets the needed quality standards that the user demands.

However, constraints can really slow down the loading of data so quite often the loading program (like SSIS) disables your constraints to make it load the data faster. Unfortunately, it doesn’t re-enable the constraints after it is done loading the data. So I created the following procedure to re-enable the constraints in some or all of the tables in the database. I will put the procedure in a Utility schema and restrict access to only sysadmin users (even going so far as to use a DENY permission for non-sysadmin access).

 CREATE SCHEMA Utility;

 GO

 CREATE PROCEDURE Utility.Constraints$ResetEnableAndTrustedStatus

 (

 @table_name sysname = '%',

 @table_schema sysname = '%',

 @doForeignKeyFlag bit = 1,

 @doCheckFlag bit = 1

 ) as

 -- ----------------------------------------------------------------

 -- Enables disabled foreign key and check constraints, and sets

 -- trusted status so optimizer can use them

 --

 -- NOTE: This code expects the Monitor.TableRowCount to be in the same db as the

 -- tables being monitored. Rework would be needed if this is not a possibility

 --

 -- 2012 Louis Davidson – [email protected] &#x2013; drsql.org

 -- ----------------------------------------------------------------

     BEGIN

     SET NOCOUNT ON;

     DECLARE @statements cursor; --use to loop through constraints to execute one

            --constraint for individual DDL calls

   SET @statements = cursor for

     WITH FKandCHK AS (SELECT OBJECT_SCHEMA_NAME(parent_object_id) AS schemaName,

     OBJECT_NAME(parent_object_id) AS tableName,

     NAME AS constraintName, Type_desc AS constraintType,

     is_disabled AS DisabledFlag,

     (is_not_trusted + 1) % 2 AS TrustedFlag

   FROM sys.foreign_keys

   UNION ALL

   SELECT OBJECT_SCHEMA_NAME(parent_object_id) AS schemaName,

 OBJECT_NAME(parent_object_id) AS tableName,

 NAME AS constraintName, Type_desc AS constraintType,

 is_disabled AS DisabledFlag,

 (is_not_trusted + 1) % 2 AS TrustedFlag

   FROMsys.check_constraints )

   SELECT schemaName, tableName, constraintName, constraintType,

      DisabledFlag, TrustedFlag

   FROM FKandCHK

   WHERE (TrustedFlag = 0 OR DisabledFlag = 1)

   AND ((constraintType = 'FOREIGN_KEY_CONSTRAINT' AND @doForeignKeyFlag = 1)

         OR (constraintType = 'CHECK_CONSTRAINT' AND @doCheckFlag = 1))

   AND schemaName LIKE @table_Schema

   AND tableName LIKE @table_Name;

   OPEN @statements;

   DECLARE @statement varchar(1000), @schemaName sysname,

        @tableName sysname, @constraintName sysname,

        @constraintType sysname,@disabledFlag bit, @trustedFlag bit;

   WHILE 1=1

   BEGIN

   FETCH FROM @statements INTO @schemaName, @tableName, @constraintName,

        @constraintType, @disabledFlag, @trustedFlag;

   IF @@FETCH_STATUS <> 0

     BREAK;

   BEGIN TRY -- will output an error if it occurs but will keep on going

     --so other constraints will be adjusted

     IF @constraintType = 'CHECK_CONSTRAINT'

     SELECT @statement = 'ALTER TABLE ' + @schemaName + '.'

       + @tableName + ' WITH CHECK CHECK CONSTRAINT '

       + @constraintName;

     ELSE IF @constraintType = 'FOREIGN_KEY_CONSTRAINT'

     SELECT @statement = 'ALTER TABLE ' + @schemaName + '.'

       + @tableName + ' WITH CHECK CHECK CONSTRAINT '

       + @constraintName;

     EXEC (@statement);

   END TRY

   BEGIN CATCH --output statement that was executed along with the error number

     select 'Error occurred: ' + cast(error_number() as varchar(10))+ ':' +

     error_message() + char(13) + char(10) + 'Statement executed: ' +

     @statement;

   END CATCH

   END;

   END;

 GO

I have several more of these available in the downloads of my web site (drsql.org) to manage (and mostly drop) all types of objects in bulk (constraints, indexes, etc) for when you need to remove the constraints from a table, often in an attempt to update the structure. I keep a model database/structure snapshot of how the database should look, and then I can remove anything I need to and simply add it back using a comparison tool.

Logging Objects

In many systems, you will find you need to watch the activities that are occurring in the database. Back in Chapter 7 we implemented an audit trail using triggers, and we chose between using the source table schema or some form of utility schema. In this section, the types of logging we will want to do are a more generic form of logging that is more for the DBAs to see what errors have been occurring.

As an example, one thing that we often may want to log is errors. A common goal these days is to make sure that no errors occur at the database level. Logging any errors that occur to a table can help to see where you have recurring issues. Probably the most interesting way this has ever helped me in my systems was once when a programmer had simply ignored all return values from SQL calls. So a large number of calls to the system were failing, but the client never realized it. (In Appendix B, I will employ this functionality in the trigger templates that I provide).

The Utility.ErrorLog$insert procedure is used to log the errors that occur in a table, to give you a history of errors that have occurred. I do this because, in almost every case, an error that occurs in a trigger is a bad thing. The fact that the client sends data that might cause the trigger to fail should be fixed and treated as a bug. In stored procedures, this may or may not be the case, as stored procedures can be written to do things that may work, or may fail in some situations. This is a very broad statement, and in some cases may not be true, so you can adjust the code as fits your desires.

The DML for the table is as follows:

 CREATE TABLE Utility.ErrorLog(

   ErrorLogId int NOT NULL IDENTITY CONSTRAINT PKErrorLog PRIMARY KEY,

       Number int NOT NULL,

   Location sysname NOT NULL,

   Message varchar(4000) NOT NULL,

   LogTime datetime2(3) NULL

     CONSTRAINT DFLTErrorLog_error_date DEFAULT (SYSDATETIME()),

   ServerPrincipal sysname NOT NULL

     --use original_login to capture the user name of the actual user

     --not a user they have impersonated

       CONSTRAINT DFLTErrorLog_error_user_name DEFAULT (ORIGINAL_LOGIN())

 );

Then we create the following procedure, which can be coded into other procedures and trigger whenever you need to log that an error occurred:

 CREATE PROCEDURE Utility.ErrorLog$Insert

 (

   @ERROR_NUMBER int,

   @ERROR_LOCATION sysname,

   @ERROR_MESSAGE varchar(4000)

 ) AS

 -- ----------------------------------------------------------------

 -- Writes a row to the error log. If an error occurs in the call (such as a NULL value)

 -- It writes a row to the error table. If that call fails an error will be returned

 --

 -- 2012 Louis Davidson – [email protected] &#x2013; drsql.org

 -- ----------------------------------------------------------------

   BEGIN

   SET NOCOUNT ON;

   BEGIN TRY

     INSERT INTO Utility.ErrorLog(Number, Location, Message)

     SELECT @ERROR_NUMBER,COALESCE(@ERROR_LOCATION,'No Object'),@ERROR_MESSAGE;

   END TRY

   BEGIN CATCH

     INSERT INTO Utility.ErrorLog(Number, Location, Message)

     VALUES (-100, 'Utility.ErrorLog$insert',

     'An invalid call was made to the error log procedure ' +

       ERROR_MESSAGE());

   END CATCH

 END;

Then we test the error handler with a simple test case, as follows:

 --test the error block we will use

 BEGIN TRY

   THROW 50000,'Test error',16;

 END TRY

 BEGIN CATCH

   IF @@trancount > 0

     ROLLBACK TRANSACTION;

   --[Error logging section]

     DECLARE @ERROR_NUMBER int = ERROR_NUMBER(),

       @ERROR_PROCEDURE sysname = ERROR_PROCEDURE(),

       @ERROR_MESSAGE varchar(4000) = ERROR_MESSAGE();

   EXEC Utility.ErrorLog$Insert @ERROR_NUMBER,@ERROR_PROCEDURE,@ERROR_MESSAGE;

   THROW; --will halt the batch or be caught by the caller's catch block

 END CATCH

This returns the error we threw.

 Msg 50000, Level 16, State 16, Line 3

 Test error

And checking the ErrorLog, you can see that the error is logged. (Say that three times fast. I dare you.)

 SELECT *

 FROM Utility.ErrorLog;

This returns the following:

ErrorLogId Number Location Message LogTime ServerPrincipal
----------------- ----------------- ----------------- ----------------- ----------------- -----------------
50000 No Object Test error 2012-02-01 22:18:25.615 DENALI-PCAlienDrsql

This basic error logging procedure can make it much easier to understand what has gone wrong when a user has an error. Expand your own system to meet your organization’s needs, but having an audit trail will prove invaluable when you find out that certain types of errors have been going on for weeks and your users “assumed” you knew about it!

The only real downside to logging in this manner is transactions. You can log all you want, but if the log procedure is called in a transaction, and that transaction is rolled back, the log row will also be rolled back. To write to a log that isn’t affected by transactions, you can use the xp_logevent extended stored procedure in the error handler to write to the Windows Event Log. Using this method can be handy if you have deeply nested errors, in which all the dbo.ErrorLog rows get rolled back due to external transactions.

Other Possibilities…

In the end, it is really going to be up to your database’s particular needs to determine exactly what you may need to put into every (or certainly greater than one) database in your organization. In practice, I tend to see some utilities positioned in a database that are used on the server, for backups, generic index maintenance, and other sorts of generic maintenance.

In my practice, anytime the application and/or any database code references code or data from within the database, everything gets created in the database with the other code. Some other types of concepts that I have seen added to the database to enhance the application are:

  • Functions that aren’t implemented in SQL Server: User-defined functions have plenty of pros and cons when used in T-SQL statements (particularly in a WHERE clause) but they definitely have their place to wrap common, especially complex, functionality into a neat package. For example, consider the functionality we covered in the earlier section entitled: “Separating Comma Delimited Items.” We might create a function Tools.String$Split (@ValueToSplit) that encapsulates the string splitting function for reuse.
  • Security : Microsoft provides a lot of security oriented system functions for you to determine who the user is, but sometimes it isn’t enough. For some systems, I have employed a function in a schema named security to allow for overriding the system context information when needed. The goal being that whenever you need to know what user is making changes to your system, you can call Security.userName$get() function and know that it is doing the heavy lifting, either looking to original_login() system function or using a custom built security system.
  • Reference/demographic information : It is very common for systems to need the city, state, zip/postal code, phone number, and/or country information of customers/clients. Several companies (including the United States Postal Service) publish the domain data that defines the regions and format of this data, but in every case I have seen, they provide the data in arcane formats that are difficult to work with. So we create a demographics database with a schema (we use reference for ours) that we can duplicate in any system that needs this sort of data so we have to load it once from the foreign format, and in the expected format everywhere else it is needed making it easier to share around the enterprise.

As we said, the sky is the limit; use your imagination as to how you can build up your code base in a way that is reusable and also will enhance your application’s needs.

Summary

In most chapters we end with a section covering best practices, but in this chapter there really is only one primary best practice: keep your code encapsulated within the bounds of each database. Even if you use the same code in every database and it is the same now, it may not remain that way forever. The biggest lessons we should have learned from our procedural programmer cousins is that trying to manage > 1 user of the same code leads to a excessively warm condition where the gatekeeper wears red pajamas and carries a pitchfork since every user needs to be updated simultaneously, or you end up with bunches of slightly different copies of code and no real idea who is using what.

The main thrust of the chapter was to demonstrate a number of interesting tools you can use to improve your databases for the convenience of the programmers and DBAs, as well as a number of ideas of how you might start to expand your own database implementations, including:

  • Numbers table: A table of numbers that you can use to perform actions where a sequence of values might be useful. For example, finding gaps in another sequence of values (like identity values).
  • Calendar table: A table that can help you use to implement methods of grouping date data using typical relational methods, both for normal date groupings like year, month, etc, but also for custom times like corporate fiscal calendars, sales, holiday, etc. While the most common usage might be for data warehouse/reporting systems, having the calendar table available in your OLTP database can be very useful/
  • Utilities: Every programmer has code that they use to make their job easier. I presented conceptual utilities to monitor usage of the system and extended DDL to support operations that aren’t part of the base DDL in T-SQL. Having them accessible in the database means that you can always be sure that the version that the programmer expected to be there will be.
  • Logging actions: Utilities to log the actions of users in the database, generally for system management reasons. A common use is an error log to capture when and where errors are occurring.
  • Any other use that you can dream up to make it easier for you, the programmers, and the DBAs to work with the database on any server, from development, to QA, and production. The more add-ins you can create that can make the experience across database implementations more consistent the better off you will be once your database is out in the world being used and, of course, maintained for years (likely decades) to come.

The goal is no and should always be to make the database its own universe. With the new contained database features in SQL Server 2012, and SQL Azure gaining traction, the database container is going to be closing up tighter than your grandmother’s Tupperware casserole keeper. And even if it weren’t, the more of the functionality that ends up in the database, the easier it will be to test that everything works, and that a change in one system will have no effect on the other. And that should mean fewer questions at 3:00 a.m. about why something failed because another database wasn’t available when another needed it. That can’t be a bad thing, can it?

1 Ivars Peterson, “Taxicab Numbers,” www.sciencenews.org/view/generic/id/2948/title/Math_Trek__Taxicab_Numbers .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.106.237