Reusable Standard Database Components
One little spark, of inspiration, is at the heart, of all creation. Right at the start, of everything that’s new. One little spark, lights up for you.
— The Sherman Brothers for Disney
As we near the end of the database design process, the database is pretty much completed from a design standpoint. We have spent time looking at performance, concurrency and security patterns that you can follow to help implement the database in a manner that will work well under most any typical OLTP style load. In this chapter (and again somewhat in the next two), we are going to look at applying “finishing touches” to the database that can be used to enhance the user experience and assist with the querying and maintaining of the database. We won’t flesh out all of the details for every concept in this chapter, because a lot of the examples include parts of the system that are decidedly more aligned to the DBA than the architect/programmer (or are just far too large to fit in this book). However, the point I will be making make here is that our goal will be end up with a self contained database container with as much of the database coding and maintenance functionality as possible. In Chapter 9, we introduced the concept of a contained database to help maintain a secure and portable database, and here I will present some additional add-in capabilities and expansion point ideas to add to make your database that much more usable.
The reality of database design is that most databases are rarely cookie-cutter affairs. Most companies, even when they buy a third-party package to implement some part of their business, are going to end up making (in many cases substantial) customizations to the database to fit their needs. If you are starting a very large project, you may even want to look at previous models or perhaps pre-built “universal data models,” such as those in Len Silverston’s series of books, the first of which is The Data Model Resource Book: A Library of Universal Data Models for All Enterprises (Wiley, 2001) (perhaps the only book on database design with a larger title that the book you hold in your hands right now.) Karen Lopez (@datachick on Twitter) frequently speaks on the subject of universal models in the SQL PASS universe of presenters that I am generally involved with. Even these universal models may only be useful as starting points to help you map from your “reality” to a common view of a given sort of business.
In this chapter, however, I want to explore the parts of the database that I find to be useful and almost always the same for every database I create. Not every database will contain all of what I will present, but when I need a common feature I will use the exact same code in every database, with the obvious caveat that I am constantly looking for new ways to improve almost everything I use over time (not to mention it gives me something to write about when I run out of Lego sets to build). Hence, sometimes a database may use an older version of a feature until it can be upgraded. I will cover the following topics:
Not every database can look alike, even two that do almost the exact same thing will rarely be all that alike unless the designer is the same, but following the patterns of implementation we have discussed all throughout the book thus far and the practices we will discuss in this chapter, we can produce databases that are similar enough such that the people supporting your work will have it easy figuring out what you had in mind.
If you are dealing with a third party system where it is forbidden to add any of your own objects, even in a schema that is separated from the shipped schemas, don’t think that everything I am saying here doesn’t apply to you. All of the example code presented supposes a single database approach. In such cases a common approach is to create a companion database where you locate code you need to access their code from the database tier. Some examples would need to be slightly reworked for that model, but that rework would be minimal.
Note For the examples in this chapter, I am going to use a copy of the Adventureworks2012 database to stick to the supposition of the chapter that we should place the tables in the database with the data you are working with. If you are working with a community version of AdventureWorks2012 that you cannot modify, you can build your own companion database for the examples. I will include a comment in the query to note where the data is specifically from that database. In cases where cross database access will not be trivial, I will note that in the code with a comment.
A numbers table is a precalculated table of numbers, primarily non-negative integers, which you can use for some purpose. The name “numbers” is pretty open ended, but getting so specific as nonNegativeIntegers is going to get you ridiculed by the other programmers on the playground. In previous editions of the book I have used the name sequence, but with the addition of the sequence object, the name “numbers” was the next best thing. We will use the numbers table when we need to work with data in an ordered manner, particularly a given sequence of numbers. For example, if you needed a list of the top ten products sold and you only sold six, you would have to somehow manufacture four more rows for display. Having a table where you can easily output a sequence of numbers is going to be a very valuable asset at times indeed.
While you can make the numbers table contain any type of numbers you may need, usually it is just a simple table of non-negative integers from 0 to some reasonable limit, where reasonable is more or less how many you find you need. I generally load mine by default up to 99999 (99999 gives you full five digits (and is a very convenient number for the query I will use to load the table.) With the algorithm I will present, you can easily expand to create a sequence of numbers that is larger than you can store in SQL Server.
There are two really beautiful things behind this concept. First, the table of non-negative integers has some great uses dealing with text data, as well as doing all sorts of math with. Second, you can create additional attributes or even other sequence tables that you can use to represent other sets of numbers that you find useful or interesting. For example:
In the examples in this section, we will look at several techniques you may find useful, and possibly quite often. The code to generate a simple numbers table of integers is pretty simple; though it looks a bit daunting the first time you see it. It is quite fast to execute in this form, but no matter how fast it may seem, it is not going to be faster than querying from a table that has the sequence of numbers precalculated and stored ahead of time.
;WITH digits (I) AS
(--set up a set of numbers from 0-9
SELECT I
FROM (VALUES (0),(1),(2),(3),(4),
(5),(6),(7),(8),(9)) AS digits (I))
,integers (I) AS (
SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I)
-- + (10000*D5.I) + (100000*D6.I)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3
CROSS JOIN digits AS D4
--CROSS JOIN digits AS D5 CROSS JOIN digits AS D6
)
SELECT I
FROM integers
ORDER BY I;
This code will return a set of 10,000 rows, as follows:
I
-----
0
1
2
…
9998
9999
Uncommenting the code for the D5 and D6 tables will give you an order of magnitude increase for each, up to 999,999 rows. The code itself is pretty interesting (this isn’t a coding book, but it is a really useful technique). Breaking the code down, you get the following:
;WITH digits (I) AS
(--set up a set of numbers from 0-9
SELECT I
(VALUES (0),(1),(2),(3),(4),
(5),(6),(7),(8),(9)) AS digits (I))
This is just simply a set of ten rows from 0 to 9. The next bit is where the true brilliance begins. (No, I am not claiming I came up with this. I first saw it on Erland Sommarskog’s web site a long time ago, using a technique I will show you in a few pages to split a comma-delimited string.) You cross-join the first set over and over, multiplying each level by a greater power of 10. The result is that you get one permutation for each number. For example, since 0 is in each set, you get one permutation that results in 0. You can see this better in the following smallish set:
;WITH digits (I) AS (--set up a set of numbers from 0-9
SELECT i
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I))
SELECT D1.I AS D1I, 10*D2.I AS D2I, D1.I + (10*D2.I) AS [Sum]
FROM digits AS D1 CROSS JOIN digits AS D2
ORDER BY [Sum];
This returns the following, and you can see that by multiplying the D2.I value by 10, you get the ten’s place repeated, giving you a very powerful mechanism for building a large set. In the full query, each of the additional digit table references have another power of ten in the SELECT clause multiplier , allowing you to create a very large set (rows removed and replaced with . . . for clarity and to save a tree):
D1I | D2I | Sum |
------------- | ------------- | ------------- |
0 | 0 | 0 |
1 | 0 | 1 |
2 | 0 | 2 |
3 | 0 | 3 |
4 | 0 | 4 |
5 | 0 | 5 |
6 | 0 | 6 |
7 | 0 | 7 |
8 | 0 | 8 |
9 | 0 | 9 |
0 | 10 | 10 |
1 | 10 | 11 |
2 | 10 | 12 |
3 | 10 | 13 |
4 | 10 | 14 |
... | ||
6 | 80 | 86 |
7 | 80 | 87 |
8 | 80 | 88 |
9 | 80 | 89 |
0 | 90 | 90 |
1 | 90 | 91 |
2 | 90 | 92 |
3 | 90 | 93 |
4 | 90 | 94 |
5 | 90 | 95 |
6 | 90 | 96 |
7 | 90 | 97 |
8 | 90 | 98 |
9 | 90 | 99 |
This kind of combination of sets is a very useful technique in relational coding. As I said earlier, this isn’t a query book, but I feel it necessary to show you the basics of why this code works, because it is a very good mental exercise. Using the full query, you can create a sequence of numbers that you can use in a query.
So, initially create a simple table named Number with a single column I (because it is a typical value used in math to denote an index in a sequence, such as x,, where the I denotes a sequence of values of x. The primary purpose of the Numbers is to introduce an ordering to a set to assist in an operation.). I will create this table in a schema named Tools to contain the types of tool objects, functions, and procedures we will build in this chapter. In all likelihood, this is a schema you would grant EXECUTE and SELECT to public and make the tools available to any user you have given ad hoc query access to.
USE AdventureWorks2012;
GO
CREATE SCHEMA Tools;
GO
CREATE TABLE Tools.Number
(
I int NOT NULL CONSTRAINT PKTools_Number PRIMARY KEY
);
Then I will load it with integers from 0 to 99999.
;WITH digits (I) AS (--set up a set of numbers from 0-9
SELECT I
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I))
--builds a table from 0 to 99999
,Integers (I) AS (
SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I) + (10000*D5.I)
--+ (100000*D6.I)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3
CROSS JOIN digits AS D4 CROSS JOIN digits AS D5
/* CROSS JOIN digits AS D6 */)
INSERT INTO Tools.Number(I)
SELECT I
FROM Integers;
So if you wanted to count the integers between 1 and 1000 (inclusive), it is as simple as:
SELECT COUNT(*)
FROM Tools.Number
WHERE I between 1 and 1000;
Of course, that would be a bit simplistic, and there had better be 1000 values between (inclusive) 1 and 1000, but what if you wanted the number of integers between 1 and 1000 that are divisible by 9 or 7?
SELECT COUNT(*)
FROM Tools.Number
WHERE I BETWEEN 1 AND 1000
AND (I % 9 = 0 OR I % 7 = 0);
This returns the obvious answer: 238. Sure, a math nerd could sit down and write a formula to do this, but why? And if you find you need these values quite often, you could create a table called Tools.DivisibleByNineAndSevenNumber , or add columns to the number table called DivisibleByNineFlag and DivisibleBySevenFlag if it were needed in context of integers that were not divisible by 9 or 7. The simple numbers table is the most typical need, but you can make a table of any sorts of numbers that you need (prime numbers? squares? cubes?). The last example of this chapter is an esoteric example of what you can do with a table of numbers to do some pretty (nerdy) fun stuff with a table of numbers, but for OLTP use, the goal will be (as we discussed in Chapter 5 on normalization) to pre-calculate values only when they are used often and can never change. Numbers-type tables are an excellent candidate for storing pre-calculated values because the set of integer numbers and prime numbers are the same now as back in 300 BC when Euclid was working with them.
In this section, I will present the following uses of the numbers table to get you going:
Determining the Contents of a String
As a fairly common example usage, it sometimes occurs that a value in a string you are dealing with is giving your code fits, but it isn’t easy to find what the issue is. If you want to look at the Unicode (or ASCII) value for every character in a string, you can do something like the following:
DECLARE @string varchar(20) = 'Hello nurse!';
SELECT Number.I as Position,
SUBSTRING(split.value,Number.I,1) AS [Character],
UNICODE(SUBSTRING(split.value,Number.I,1)) AS [Unicode]
FROM Tools.Number
CROSS JOIN (SELECT @string AS value) AS split
WHERE Number.I > 0 --No zeroth position
AND Number.I <= LEN(@string)
ORDER BY Position;
This returns the following:
Position | Character | Unicode |
------------- | ------------- | ------------- |
1 | H | 72 |
2 | e | 101 |
3 | l | 108 |
4 | l | 108 |
5 | o | 111 |
6 | 32 | |
7 | n | 110 |
8 | u | 117 |
9 | r | 114 |
10 | s | 115 |
11 | e | 101 |
12 | ! | 33 |
This in and of itself is interesting, and sometimes when you execute this, you might see a little square character that can’t be displayed and a really large/odd Unicode value (like 20012, picking one randomly) that you didn’t expect in your database of English-only words. What really makes the technique awesome is that not only didn’t we have to write a routine to go column by column, we won’t have to do go row by row either. Using a simple join, you can easily do this for a large number of rows at once, this time joining to a table in the AdventureWorks2012 database that can provide us with an easy example set.
SELECT LastName, Number.I AS position,
SUBSTRING(Person.LastName,Number.I,1) AS [char],
UNICODE(SUBSTRING(Person.LastName, Number.I,1)) AS [Unicode]
FROM /*Adventureworks2012.*/ Person.Person
JOIN Tools.Number
ON Number.I <= LEN(Person.LastName )
AND UNICODE(SUBSTRING(Person.LastName, Number.I,1)) IS NOT NULL
ORDER BY LastName;
This returns 111,969 rows (one for each character in a last name) in only around 3 seconds on a virtual machine hosted on my writing laptop (which is a quite decent Alienware MX11 Core 2 Duo 1.3 GHz; 8 GB; 250 GB, 7200RPM SATA drive; 11.6-inch netbook-sized laptop).
LastName | position | char | Unicode |
----------------- | ---------- | ------------------ | ------------------ |
Abbas | 1 | A | 65 |
Abbas | 2 | b | 98 |
Abbas | 3 | b | 98 |
Abbas | 4 | a | 97 |
Abbas | 5 | s | 115 |
Abel | 1 | A | 65 |
Abel | 2 | b | 98 |
Abel | 3 | e | 101 |
Abel | 4 | l | 108 |
...... | ... | . | ... |
With that set, you could easily start eliminating known safe Unicode values with a simple where clause and find your evil outlier that is causing some issue with some process. For example, you could find all names that include a character not in the normal A–Z, space, comma, or dash characters.
SELECT LastName, Number.I AS Position,
SUBSTRING(Person.LastName,Number.I,1) AS [Char],
UNICODE(SUBSTRING(Person.LastName, Number.I,1)) AS [Unicode]
FROM /*Adventureworks2012.*/ Person.Person
JOIN Tools.Number
ON Number.I <= LEN(Person.LastName )
AND UNICODE(SUBSTRING(Person.LastName, Number.I,1)) IS NOT NULL
--Note I used both a-z and A-Z in LIKE in case of case sensitive AW database
WHERE SUBSTRING(Person.LastName, Number.I,1) NOT LIKE '[a-zA-Z ∼''∼-]' ESCAPE '∼'
ORDER BY LastName, Position;
This returns the following:
LastName | position | char | Unicode |
----------------- | ---------- | ------------------ | ------------------ |
Mart¡nez | 5 | ¡ | 161 |
This can be a remarkably powerful tool when trying to figure out what data is hurting your application with some unsupported text particularly when dealing with a stream of data from an outside source.
Finding Gaps in a Sequence of Numbers
Another common issue that we have when using a surrogate is that there can be gaps in their values. Ideally, this should not be an issue, but when troubleshooting errors it is often useful to be able to determine the missing numbers in a range. For example, say you have a table with a domain of values between 1 and 10. How might you determine if a value isn’t used? This is fairly simple; you can just do a distinct query on the used values and then check to see what values aren’t used, right? Well how about if you had to find missing values in 20,000+ distinct values? This is not quite going to work if a lot of values aren’t used. For example, consider the Person table in the AdventureWorks2012 database. Running the following query, you can see that not every BusinessEntityID is used.
SELECT MIN(BusinessEntityID) AS MinValue, MAX(BusinessEntityID) AS MaxValue,
MAX(BusinessEntityID) - MIN(BusinessEntityID) + 1 AS ExpectedNumberOfRows,
COUNT(*) AS NumberOfRows,
MAX(BusinessEntityID) - COUNT(*) AS MissingRows
FROM /*Adventureworks2012.*/ Person.Person;
This returns the following:
MinValue | MaxValue | ExpectedNumberOfRows | NumberOfRows | MissingRows |
----------------- | ---------- | ------------------ | ------------------ | ------------------ |
1 | 20777 | 20777 | 19972 | 805 |
So we know that there are 805 rows missing between BusinessEntityID values 1 and 20777. To discover these rows, we take a set of values from 1 to 20777 with no gaps, and subtract the rows using the EXCEPT relational operator:
SELECT Number.I
FROM Tools.Number
WHERE I BETWEEN 1 AND 20777
EXCEPT
SELECT BusinessEntityID
FROM /*Adventureworks2012.*/ Person.Person;
Execute this query and you will find that there are 805 rows returned. Using the subtraction method with the Numbers table is a very powerful method that you can use in lots of situations where you need to find what isn’t there rather than what is.
Separating Comma Delimited Items
My last example that you can translate to a direct business need comes from Erland Sommarskog’s web site ( www.sommarskog.se/ ) on arrays in SQL Server, as well as Aaron Bertrand’s old ASPFAQ web site. Using this code, you can take a comma-delimited list to return it as a table of values (which is the most desirable form for data in SQL Server in case you have just started reading this book on this very page and haven’t learned about normalization yet).
DECLARE @delimitedList varchar(100) = '1,2,3'
SELECT SUBSTRING(',' + @delimitedList + ',',I + 1,
CHARINDEX(',',',' + @delimitedList + ',',I + 1) - I - 1) AS value
FROM Tools.Number
WHERE I >= 1
AND I < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', I, 1) = ','
ORDER BY I;
This returns the following:
Value
------
1
2
3
The way this code works is pretty interesting in and of itself. It works by doing a substring on each row. The key is in the WHERE clause.
WHERE I >= 1
AND I < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', i, 1) = ','
The first line is there because SUBSTRING starts with position 1. The second limits the rows in Tools.Number to more than the length of the @delimitedList variable. The third includes rows only where the SUBSTRING of the value at the position returns the delimiter, in this case, a comma. So, take the following query:
DECLARE @delimitedList varchar(100) = '1,2,3';
SELECT I
FROM Tools.Number
WHERE I >= 1
AND I < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', I, 1) = ','
ORDER BY I;
Executing this, you will see the following results, showing you the position of each value in the list:
Value
-----
1
3
5
Since the list has a comma added to the beginning and end of it in the query, you will see that that the positions of the commas are represented in the list. The SUBSTRING in the SELECT clause of the main query simply fetches all of the @delimitedList value up to the next comma. This sort of use of the sequence table will allow you to do what at first seems like it would require a massive, iterating algorithm in order to touch each position in the string individually (which would be slow in T-SQL, though you might get away with it in the CLR) and does it all at once in a set-based manner that is actually very fast.
Finally, I have expanded on this technique to allow you to do this for every row in a table that needs it by joining the Tools.Numbers table and joining on the values between 1 and the length of the string (and delimiters). The best use for this code is to normalize a set of data where some programmer thought it was a good idea to store data in a comma-delimited list (it rarely is) so that you can use proper relational techniques to work with this data. It is actually pretty sad how often you may find this query useful.
CREATE TABLE dbo.poorDesign
(
poorDesignId int,
badValue varchar(20)
);
INSERT INTO dbo.poorDesign
VALUES (1,'1,3,56,7,3,6'),
(2,'22,3'),
(3,'1'),
The code just takes the stuff in the WHERE clause of the previous query and moves it into JOIN criteria.
SELECT poorDesign.poorDesignId AS betterDesignId,
SUBSTRING(',' + poorDesign.badValue + ',',I + 1,
CHARINDEX(',',',' + poorDesign.badValue + ',', I + 1) - I - 1)
AS betterScalarValue
FROM dbo.poorDesign
JOIN Tools.Number
ON I >= 1
AND I < LEN(',' + poorDesign.badValue + ',') - 1
AND SUBSTRING(',' + + poorDesign.badValue + ',', I, 1) = ',';
This returns the following:
betterDesignId | betterScalarValue |
----------------- | ----------------- |
1 | 1 |
1 | 3 |
1 | 56 |
1 | 7 |
1 | 3 |
1 | 6 |
2 | 22 |
2 | 3 |
3 | 1 |
Ah…that’s much better. Each row of the output represents only a single value, not an array of values. As I have said many times throughout the book, SQL works great with atomic values, but try to get individual values out of a single column, and you get ugly code like I have just presented. It is an excellent solution for the problem; in fact, it is the fault of the problem that makes it ugly. Now just create the table using this better design of having one row per scalar value, insert the data, and drop the bad designed table. I won’t create the better design, but we do need to clean up the poorDesign table with the following, lest someone stumbles upon it and uses it as a good idea:
DROP TABLE dbo.poorDesign;
Stupid Mathematic Tricks
I want to give you a final, (hopefully) entertaining, and esoteric usage of the sequence table to get your mind working on the possibilities. One of my favorite episodes of Futurama is an episode called “Lesser of Two Evils.” In this episode, Bender and the Bender look-alike named Flexo start talking and have the following exchange: Bender look-alike named Flexo start talking and have the following exchange (they are both Bender units…Did someone call for a nerd?):
Bender: Hey, brobot, what’s your serial number?
Flexo: 3370318.
Bender: No way! Mine’s 2716057!
Fry (a human): I don’t get it.
Bender: We’re both expressible as the sum of two cubes!
So, I figured, the sum of two cubes would be an interesting and pretty easy abstract utilization of the numbers table. “Taxicab ” numbers are also mentioned on the ScienceNews.org web site, 1 where the goal is to discover the smallest value that can be expressed as the sum of three cubes in N different ways. They are called taxicab numbers because of an old (and pretty darn nerdy) story in which one mathematician remarked to another mathematician that the number 1729 on a taxicab was “dull,” to which the other one remarked that it was very interesting, because it was the smallest number that could be expressed as the sum of two cubes. (You can judge your own nerdiness by whether you think: A. This is stupid; B. This is cool; or C. You have done it yourself.)
How hard is the query? It turns out that once you have a sequence table with numbers from 1 to 100,000 or so, you can calculate that Taxicab(2) = 1729 very easily (and all of the other numbers that are the sum of two cubes too) and the sum of two cubes in three different ways also pretty easily, It took three seconds on my laptop, and that value is 87539319.
But, instead of calculating the value of each integer cubed (power(i,3)) for each iteration, you can add a computed column to the table, this time as a bigint to give the later calculations room to store the very large intermediate values when you start to multiply the two cube values together. You can do something like the following:
ALTER TABLE Tools.Number
ADD Ipower3 AS CAST(POWER(CAST(I AS bigint),3) AS bigint) PERSISTED;
--Note that I had to cast I as bigint first to let the power function
--return a bigint
Now, here is the code:
DECLARE @level int = 2; --sum of two cubes in @level ways
;WITH cubes AS
(SELECT Ipower3
FROM Tools.Number
WHERE I >= 1 AND I < 500) --<<<Vary for performance, and for cheating reasons,
--<<<max needed value
SELECT c1.Ipower3 + c2.Ipower3 AS [sum of 2 cubes in @level Ways]
FROM cubes AS c1
CROSS JOIN cubes AS c2
WHERE c1.Ipower3 <= c2.Ipower3 --this gets rid of the "duplicate" value pairs
GROUP BY (c1.Ipower3 + c2.Ipower3)
HAVING count(*) = @level
ORDER BY [sum of 2 cubes in @level Ways];
This will return 559 rows in just a second or two. The first row is 1729, which is the smallest number that is the sum of two cubes in two different ways. OK, breaking this down the cube’s CTE, the code is pretty simple.
(SELECT Ipower3
FROM Tools.Number
WHERE I >= 1 AND I < 500)
This limits the values to a table of cubes, but only the first 499. The next part of the query is a bit more interesting. I sum the two cube values, which I get from cross-joining the CTE to itself.
SELECT c1.Ipower3 + c2.Ipower3 AS [sum of 2 cubes in @level Ways]
FROM cubes AS c1
CROSS JOIN cubes AS c2
WHERE c1.Ipower3 <= c2.Ipower3 --this gets rid of the "duplicate" value pairs
The WHERE condition of c1.i3 <= c2.i3 gets rid of the “duplicate” value pairs since c1 and c2 have the same values, so without this, for 1729 you would get the following:
c1.i3 | c2.i3 |
----------------- | ----------------- |
1 | 1728 |
729 | 1000 |
1000 | 729 |
1728 | 1 |
These pairs are the same, technically, just in reverses. I don’t eliminate equality to allow for the case where both numbers are equal, because they won’t be doubled up in this set and if the two cubes were of the same number, it would still be interesting if it were the same as a different sum of two cubes. With the following values:
c1.i3 | c2.i3 |
----------------- | ----------------- |
1 | 1728 |
729 | 1000 |
Now you can see that 1729 is the sum of two cubes in two different ways. So, lastly, the question of performance must come up. Reading the articles, it is clear that this is not a terribly easy problem to solve as the values get really large. Values for the sum of three cubes are fairly simple. Leaving the sequence values bounded at 500.
DECLARE @level int = 3; --sum of two cubes in @level ways
;WITH cubes AS
(SELECT Ipower3
FROM Tools.Number
WHERE I >= 1 AND I < 500) --<<< Vary for performance, and for cheating reasons,
--<<< max needed value
SELECT c1.Ipower3 + c2.Ipower3 AS [sum of 2 cubes in @level Ways]
FROM cubes AS c1
CROSS JOIN cubes AS c2
WHERE c1.Ipower3 < c2.Ipower3
GROUP BY (c1.Ipower3 + c2.Ipower3)
HAVING count(*) = @level
ORDER BY [sum of 2 cubes in @level Ways];
This returns in around a second:
sum of 2 cubes in @level Ways | |
----------------- | |
87539319 | 119824488 |
Four, however, was a “bit” more challenging. Knowing the answer from the article, I knew I could set a boundary for my numbers using 20000 and get the answer.
DECLARE @level int = 4 --sum of two cubes in @level ways
;WITH cubes AS
(SELECT Ipower3
FROM Tools.Number
WHERE I >= 1 and I < 20000) --<<<Vary for performance, and for cheating reasons,
--<<<max needed value
SELECT c1.Ipower3 + c2.Ipower3 AS [sum of 2 cubes in @level Ways]
FROM cubes AS c1
CROSS JOIN cubes AS c2
WHERE c1.Ipower3 < c2.Ipower3
GROUP BY (c1.Ipower3 + c2.Ipower3)
HAVING count(*) = @level
ORDER BY [sum of 2 cubes in @level Ways];
Using this “cheat” of knowing how to tweak the number you need to on my laptop’s VM, I was able to calculate that the minimum value of taxicab(4) was 6963472309248 (yes, it found only the one) in just 2 hours and 42 seconds on the same VM and laptop mentioned earlier. Clearly, the main value of T-SQL isn’t in tricks like this but that using a sequence table can give you the immediate jumping-off point to solve some problems that initially seem difficult.
Calendar Table
It is a common task for a person to want to know how to do groupings and calculations with date values. For example, you might want sales grouped by month, week, year, or any other grouping. You can usually do this using the SQL Server date functions, but often it is costly in performance, and it is always a fairly non-obvious operation. What can truly help with this process is to use a table filled with date values, commonly called a calendar table. Using a calendar table is commonplace in Business Intelligence/OLAP implementations (something that is covered more in Chapter 14), but it certainly can be useful in OLTP databases when you get stuck doing a confusing date range query.
Using the same form of precalculated logic that we applied to the numbers table, we can create a table that contains one row per date. I will set the date as the primary key and then have data related to the date as columns. The following is the basic date table that I currently use. You can extend it as you want to include working days, holidays, special events, and so on, to filter/group by in the same manner as you do with these columns, along with the others I will add later in the section (again, I am working in a copy of AdventureWorks2012, bolting on my tools to make life easier).
CREATE TABLE Tools.Calendar
(
DateValue date NOT NULL CONSTRAINT PKtools_calendar PRIMARY KEY,
DayName varchar(10) NOT NULL,
MonthName varchar(10) NOT NULL,
Year varchar(60) NOT NULL,
Day tinyint NOT NULL,
DayOfTheYear smallint NOT NULL,
Month smallint NOT NULL,
Quarter tinyint NOT NULL
);
Note I wanted to make several of these columns into persisted computed columns, but it turns out that the datename and datepart functions we will use to load the data were not deterministic functions due to how they have to work with regionalization, so we will store the values manually. In SQL Server 2012, there is an additional way to format date data using FORMAT (also non-deterministic), which allows for specification of regionalization information. I didn’t change my example to use FORMAT as DATEPART and DATENAME are perfectly valid and a bit more readable methods of extracting date information from a date value, but FORMAT would allow you to reference multiple cultures in a single statement if you needed it.
The question of normalization might come up again at this point, and it is a valid question. Since these values we have created can be calculated (and quite easily in most case), why do this? Isn’t this denormalization? There are a few ways to look at it, but I would generally say that it isn’t. Each row represents a day in time, and each of the columns is functionally dependent on the date value. The fact is, the functions have to look up or calculate the date attributes in some manner, so you are actually saving that lookup cost, however minimal it might be.) When you have to filter on date criteria, the benefit of the normalized design can be quite obvious. Later in the section we will extend the table to add more interesting, company-based values, which certainly are normalized in the context of a date.
Note that the name calendar is not truly consistent with our names representing a single row, at least not in a natural way, but the other names that we might use (date, dateValue, etc.) all sound either forced or hokey. Since the table represents a calendar and a calendar item generally would logically represent a day, I went with it. A stretch perhaps but naming is pretty difficult at times, especially when working in a namespace that the real world and SQL Server have so many established names.
The next step is to load the table with values, which is pretty much a straightforward task using the Numbers table that we just finished creating in the previous section. Using the datename and datepart functions and a few simple case expressions, you load the different values. I will make use of many of the functions in the examples, but most are very easy to understand.
WITH dates (newDateValue) AS (
SELECT DATEADD(day,I,'17530101') AS newDateValue
FROM Tools.Number
)
INSERT Tools.Calendar
(DateValue ,DayName
,MonthName ,Year ,Day
,DayOfTheYear ,Month ,Quarter
)
SELECT
dates.newDateValue as DateValue,
DATENAME (dw,dates.newDateValue) As DayName,
DATENAME (mm,dates.newDateValue) AS MonthName,
DATENAME (yy,dates.newDateValue) AS Year,
DATEPART(day,dates.newDateValue) AS Day,
DATEPART(dy,dates.newDateValue) AS DayOfTheYear,
DATEPART(m,dates.newDateValue) AS Month,
DATEPART(qq,dates.newDateValue) AS Quarter
FROM dates
WHERE dates.newDateValue BETWEEN '20000101' AND '20130101' --set the date range
ORDER BY DateValue;
Just like the numbers table, there are several commonly useful ways to use the calendar table. The first I will demonstrate is general grouping types of queries, and the second is calculating ranges. As an example of grouping, say you want to know how many sales had been made during each year in Sales.SalesOrderHeader in the AdventureWorks2012 database. This is why there is a year column in the table:
SELECT Calendar.Year, COUNT(*) as OrderCount
FROM /*Adventureworks2012.*/ Sales.SalesOrderHeader
JOIN Tools.Calendar
--note, the cast here could be a real performance killer
--consider using date columns where possible
ON CAST(SalesOrderHeader.OrderDate as date) = Calendar.DateValue
GROUP BY Calendar.Year
ORDER BY Calendar.Year;
This returns the following:
Year | OrderCount |
----------------- | ----------------- |
2005 | 1379 |
2006 | 3692 |
2007 | 12443 |
2008 | 13951 |
The beauty of the calendar table is that you can easily group values that are not that simple to compute in a really obvious manner them, all while using natural relational coding style/technique. For example: to count the sales on Tuesdays and Thursdays?
SELECT Calendar.DayName, COUNT(*) as OrderCount
FROM /*Adventureworks2012.*/ Sales.SalesOrderHeader
JOIN Tools.Calendar
--note, the cast here could be a real performance killer
--consider using date columns where
ON CAST(SalesOrderHeader.OrderDate as date) = Calendar.DateValue
WHERE Calendar.DayName in ('Tuesday','Thursday')
GROUP BY Calendar.DayName
ORDER BY Calendar.DayName;
This returns the following:
DayName | OrderCount |
----------------- | ----------------- |
Thursday | 4875 |
Tuesday | 4483 |
Tip In many tables where only the date is used in computations like this, I will add a computed column that has the date only. For example, in the SalesOrderHeader table in AdventureWorks2012, I would have named OrderDate OrderTime and added a computed column named OrderDate defined as CAST(SalesOrderHeader.OrderDate as date) that could have statistics and be indexed for faster operation.
OK, I see you are possibly still skeptical. What if I throw in the first Tuesday after the second Wednesday? How? Well, it is really a simple matter of building the set step by step using CTEs to build the set that you need.
;WITH onlyWednesdays AS --get all Wednesdays
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Calendar.Year, Calendar.Month
ORDER BY Calendar.Day) AS wedRowNbr
FROM Tools.Calendar
WHERE DayName = 'Wednesday'
),
secondWednesdays AS --limit to second Wednesdays of the month
(
SELECT *
FROM onlyWednesdays
WHERE wedRowNbr = 2
)
,finallyTuesdays AS --finally limit to the Tuesdays after the second wed
(
SELECT Calendar.*,
ROW_NUMBER() OVER (PARTITION BY Calendar.Year, Calendar.Month
ORDER by Calendar.Day) AS rowNbr
FROM secondWednesdays
JOIN Tools.Calendar
ON secondWednesdays.Year = Calendar.Year
AND secondWednesdays.Month = Calendar.Month
WHERE Calendar.DayName = 'Tuesday'
AND Calendar.Day > secondWednesdays.Day
)
--and in the final query, just get the one month
SELECT Year, MonthName, Day
FROM finallyTuesdays
WHERE Year = 2012
AND rowNbr = 1;
This returns the following set of values that appear to be the result of some advanced calculations, but rather are the results of a fairly simple query (to anyone in Chapter 12 of this book, for sure!):
Year | MonthName | Day |
---- | --------- | ---- |
2012 | January | 17 |
2012 | February | 14 |
2012 | March | 20 |
2012 | April | 17 |
2012 | May | 15 |
2012 | June | 19 |
2012 | July | 17 |
2012 | August | 14 |
2012 | September | 18 |
2012 | October | 16 |
2012 | November | 20 |
2012 | December | 18 |
Now, utilizing another CTE, you could use this to join to the Sales.SalesOrderHeader table and find out sales on the first Tuesday after the second Wednesday. And that is exactly what I would do if I was being fed requirements by a madman (or perhaps they prefer the term is marketing analyst?) and would be done in no time. However, if this date became a corporate standard desired date, , I would add a column to the calendar table (maybe called FirstTuesdayAfterSecondWednesdayFlag) and set it to 1 for every date that it matches (and in the one month where that day was a holiday and they wanted it to be the Thursday after the second Wednesday after…well, the change would be a simple update.
The realistic application of this is a company sale that they routinely start on a given day every month, so the report needs to know how sales were for four days after. So, perhaps the column would be BigSaleDaysFlag , or whatever works. The goal is the same as back in Chapter 7 when we talked about data driven design. Store data in columns; rules in code. A new type of day is just a new column and calculation that every user now has access to and doesn’t have to know the calculation that was used to create the value.
In most standard calendar table utilizations, it will be common to have, at a minimum, the following generic events/time ranges:
--Saturday or Sunday set to 1, else 0
WeekendFlag bit NOT NULL,
FiscalYear smallint NOT NULL,
FiscalMonth tinyint NULL,
FiscalQuarter tinyint NOT NULL
Almost every company has a fiscal year that it uses for its business calendar. This technique allows you to treat the fiscal time periods more or less like regular calendar dates in your code, with no modification at all. I will demonstrate this after we load the data in just a few paragraphs.
This covers the basics; now I’ll discuss one last thing you can add to the calendar table to solve the problem that really shifts this into “must-have” territory: floating windows of time.
Note This is actually one of the few techniques that I created on my own. I am not actually claiming that I am the only person to ever do this, but I have not read about it anywhere else, and I built this myself when trying to solve a particular type of problem using the system functions to no avail.
The goal for the floating windows of time was to add a relative positioning value to the table so that you could easily get N time periods from a point in time. Years are already contiguous, increasing numbers, so it is easy to do math with them. But it is not particularly comfortable to do math with months. I found myself often having to get the past 12 months of activity, sometimes including the current month and sometimes not. Doing math that wraps around a 12-month calendar was a pain, so to the calendar, I add the following columns that I will load with increasing values with no gaps:
RelativeDayCount int NOT NULL,
RelativeWeekCount int NOT NULL,
RelativeMonthCount int NOT NULL
Using these columns I will store sequence numbers that start at an arbitrary point in time. I will use '20000101' here, but it is really unimportant, and negative values are not a problem either. You should never refer to the value itself, just the value’s relative position to some point in time you choose. And days are numbered from that point (negative before, positive before), months, and again weeks. This returns the following “final” calendar table:
DROP TABLE Tools.Calendar
GO
CREATE TABLE Tools.Calendar
(
DateValue date NOT NULL CONSTRAINT PKtools_calendar PRIMARY KEY,
DayName varchar(10) NOT NULL,
MonthName varchar(10) NOT NULL,
Year varchar(60) NOT NULL,
Day tinyint NOT NULL,
DayOfTheYear smallint NOT NULL,
Month smallint NOT NULL,
Quarter tinyint NOT NULL,
WeekendFlag bit NOT NULL,
--start of fiscal year configurable in the load process, currently
--only supports fiscal months that match the calendar months.
FiscalYear smallint NOT NULL,
FiscalMonth tinyint NULL,
FiscalQuarter tinyint NOT NULL,
--used to give relative positioning, such as the previous 10 months
--which can be annoying due to month boundaries
RelativeDayCount int NOT NULL,
RelativeWeekCount int NOT NULL,
RelativeMonthCount int NOT NULL
)
Last, I will reload the table with the following code:
;WITH dates (newDateValue) AS (
SELECT DATEADD(day,I,'17530101') AS newDateValue
FROM Tools.Number
)
INSERT Tools.Calendar
(DateValue ,DayName
,MonthName ,Year ,Day
,DayOfTheYear ,Month ,Quarter
,WeekendFlag ,FiscalYear ,FiscalMonth
,FiscalQuarter ,RelativeDayCount,RelativeWeekCount
,RelativeMonthCount)
SELECT
dates.newDateValue AS DateValue,
DATENAME (dw,dates.newDateValue) AS DayName,
DATENAME (mm,dates.newDateValue) AS MonthName,
DATENAME (yy,dates.newDateValue) AS Year,
DATEPART(day,dates.newDateValue) AS Day,
DATEPART(dy,dates.newDateValue) AS DayOfTheYear,
DATEPART(m,dates.newDateValue) AS Month,
CASE
WHEN MONTH( dates.newDateValue) <= 3 THEN 1
WHEN MONTH( dates.newDateValue) <= 6 THEN 2
When MONTH( dates.newDateValue) <= 9 THEN 3
ELSE 4 END AS quarter,
CASE WHEN DATENAME (dw,dates.newDateValue) IN ('Saturday','Sunday')
THEN 1
ELSE 0
END AS weekendFlag,
------------------------------------------------
--the next three blocks assume a fiscal year starting in July.
--change if your fiscal periods are different
------------------------------------------------
CASE
WHEN MONTH(dates.newDateValue) <= 6
THEN YEAR(dates.newDateValue)
ELSE YEAR (dates.newDateValue) + 1
END AS fiscalYear,
CASE
WHEN MONTH(dates.newDateValue) <= 6
THEN MONTH(dates.newDateValue) + 6
ELSE MONTH(dates.newDateValue) - 6
END AS fiscalMonth,
CASE
WHEN MONTH(dates.newDateValue) <= 3 then 3
WHEN MONTH(dates.newDateValue) <= 6 then 4
WHEN MONTH(dates.newDateValue) <= 9 then 1
ELSE 2 END AS fiscalQuarter,
------------------------------------------------
--end of fiscal quarter = july
------------------------------------------------
--these values can be anything, as long as they
--provide contiguous values on year, month, and week boundaries
DATEDIFF(day,'20000101',dates.newDateValue) AS RelativeDayCount,
DATEDIFF(week,'20000101',dates.newDateValue) AS RelativeWeekCount,
DATEDIFF(month,'20000101',dates.newDateValue) AS RelativeMonthCount
FROM dates
WHERE dates.newDateValue BETWEEN '20000101' AND '20130101'; --set the date range
Now we can build a query to get only weekends, grouped by fiscalYear as follows:
SELECT Calendar.FiscalYear, COUNT(*) AS OrderCount
FROM /*Adventureworks2012.*/ Sales.SalesOrderHeader
JOIN Tools.Calendar
--note, the cast here could be a real performance killer
--consider using a persisted calculated column here
ON CAST(SalesOrderHeader.OrderDate as date) = Calendar.DateValue
WHERE WeekendFlag = 1
GROUP BY Calendar.FiscalYear
ORDER BY Calendar.FiscalYear;
This returns the following:
FiscalYear OrderCount
---------- ----------
2006 734
2007 1114
2008 6855
2009 234
To demonstrate the floating windows of time using the Relative_____Count columns, consider that you want to count the sales for the previous two weeks. It’s not impossible to do this using the date functions perhaps, but it’s simple to do with a calendar table:
DECLARE @interestingDate date = '20120509';
SELECT Calendar.DateValue as PreviousTwoWeeks, CurrentDate.DateValue AS Today,
Calendar.RelativeWeekCount
FROM Tools.Calendar
JOIN (SELECT *
FROM Tools.Calendar
WHERE DateValue = @interestingDate) AS CurrentDate
ON Calendar.RelativeWeekCount < (CurrentDate.RelativeWeekCount)
and Calendar.RelativeWeekCount >=
(CurrentDate.RelativeWeekCount -2);
This returns the following:
PreviousTwoWeeks | Today | RelativeWeekCount |
----------------- | ---------- | ------------------ |
2012-04-22 | 2012-05-09 | 643 |
2012-04-23 | 2012-05-09 | 643 |
2012-04-24 | 2012-05-09 | 643 |
2012-04-25 | 2012-05-09 | 643 |
2012-04-26 | 2012-05-09 | 643 |
2012-04-27 | 2012-05-09 | 643 |
2012-04-28 | 2012-05-09 | 643 |
2012-04-29 | 2012-05-09 | 644 |
2012-04-30 | 2012-05-09 | 644 |
2012-04-01 | 2012-05-09 | 644 |
2012-04-02 | 2012-05-09 | 644 |
2012-04-03 | 2012-05-09 | 644 |
2012-04-04 | 2012-05-09 | 644 |
2012-04-05 | 2012-05-09 | 644 |
From this, we can see that the previous two weeks start on Sunday (04/22/2012 and 04/29/2012) and end on Saturday (04/28/2012 and 05/05/2012). The dates 05/06/2012 to 05/09/2012 are not included because that is this week (relative to the variable value). The basics of the query is simply to take a derived table that fetches the calendar row for the “interesting” date and then join that to the full calendar table using a range of dates in the join. In the previous week’s example, I used the following:
Calendar.RelativeWeekCount < (CurrentDate.RelativeWeekCount)
AND Calendar.RelativeWeekCount >= (CurrentDate.RelativeWeekCount -2)
This was because I wanted the weeks that are previous to the current week and weeks two weeks back. Weeks aren’t the sweet spot of this technique exactly, because weeks are of fixed length (but they are easier to get a full result set in print, since I don’t get paid by the page). Now, join the values to your sales table, and you can see sales for the past two weeks. Want to see it broken down by week? Use relative week count. Use the following if you wanted the previous 12 months:
DECLARE @interestingDate date = '20120509'
SELECT MIN(Calendar.DateValue) AS MinDate, MAX(Calendar.DateValue) AS MaxDate
FROM Tools.Calendar
JOIN (SELECT *
FROM Tools.Calendar
WHERE DateValue = @interestingDate) as CurrentDate
ON Calendar.RelativeMonthCount < (CurrentDate.RelativeMonthCount)
AND Calendar.RelativeMonthCount >=
(CurrentDate.RelativeMonthCount -12);
This returns the following:
MinDate | MaxDate |
----------------- | ----------------- |
2011-05-01 | 2012-04-30 |
This query includes all earlier months, but not the one loaded into @interestingDate. Notice that the results did not include the current month. Change the < to <=, and it will include the current month. If you want to get a 24-month window, get the 12 months plus or minus the current month, as follows:
ON Calendar.RelativeMonthCount < (CurrentDate.RelativeMonthCount + 12)
AND Calendar.RelativeMonthCount >=
(CurrentDate.RelativeMonthCount -12)
Using that ON clause results in the following:
MinDate | MaxDate |
----------------- | ----------------- |
2007-03-01 | 2009-02-28 |
Now you can use these dates in other criteria, either by assigning these values to a variable or (if you are one of the cool kids) by using the tables in a join to other tables. As a real example, let’s hop in the Wabac machine and look at sales in the Adventureworks2012 database around September 27, 2008. (Keep in mind that the sample databases could change in the future to get more up-to-date data, but the 2012 RTM version had dates that were the same as the 2008 and the 2008 R2 editions of AdventureWorks.)
DECLARE @interestingDate date = '20080927';
SELECT Calendar.Year, Calendar.Month, COUNT(*) AS OrderCount
FROM /*Adventureworks2012.*/ Sales.SalesOrderHeader
JOIN Tools.Calendar
JOIN (SELECT *
FROM Tools.Calendar
WHERE DateValue = @interestingDate) as CurrentDate
ON Calendar.RelativeMonthCount <=
(CurrentDate.RelativeMonthCount )
AND Calendar.RelativeMonthCount >=
(CurrentDate.RelativeMonthCount -10)
ON cast(SalesOrderHeader.ShipDate as date)= Calendar.DateValue
GROUP BY Calendar.Year, Calendar.Month
ORDER BY Calendar.Year, Calendar.Month;
This query will give you items that shipped in the previous ten months and in August, and group them by month, as follows:
Year | Month | OrderCount |
----------------- | ----------------- | ----------------- |
2007 | 11 | 1825 |
2007 | 12 | 2228 |
2008 | 1 | 1998 |
2008 | 2 | 2034 |
2008 | 3 | 2100 |
2008 | 4 | 2058 |
2008 | 5 | 2395 |
2008 | 6 | 2360 |
2008 | 7 | 1271 |
2008 | 8 | 211 |
I included the current month by changing the first condition to <=. So, you should be able to see that the calendar table and sequence table are two excellent tables to add to almost any database. They give you the ability to take a functional problem like getting the last day of the month or the previous time periods and turn it into a relational question that SQL Server can chew up and spit out using the relational techniques it is built for.
Keep in mind too that creating more fiscal calendars, reporting calendars, corporate holiday calendars, sales calendars, and so forth, is easily done by adding more columns to this very same calendar table. For instance, you may have multiple business units, each with their own fiscal calendar. With the one normal calendar, you can add any variation of the fiscal calendar multiple times. It’s easy to just add those columns, since they all relate back to a normal calendar date.
One last thing to note is that the calendar table needs to be maintained. The insert statement we wrote earlier can be changed slightly so that it can be run to extend the number of rows as time passes. Perhaps an Agent job ran on the first day of the year to add another set of rows, or I have also used a process to keep the calendar just a few weeks in the future for a payroll system. The rules changed on occasion, so the new rows were added just a few weeks in advance based on whatever rules were then in effect.
Utility Objects
As you acquire more and more experience over the years (which is loosely synonymous with the many failures you will encounter that teach you lessons), you will undoubtedly end up with a toolbox full of various tools that you find you use quite frequently. Some of these tools will be used in an ad-hoc manner when you are dealing with the same problems over and over again. I personally like to simply keep a set of files/projects stored on my dropbox/skydrive folders for easy accessibility when I am working, writing, or just poking around at SQL Server to see what I can see at the office, home, or even at a client site.
Beyond the type of utilities that are used infrequently, the toolbox will contain a rather large set of tools that become part of the regular processing of the system. For these types of tools the logistics of their use can pose a problem. How do you make sure that they are usable by every database, as needed, and allow for the differences that will occur in every system (and keep your databases portable so that you can move them to a new server with the least amount of pain? In the old days of the twentieth century, we would make a system object in the master database that was prefixed with sp_ (and marked it as a system object) and diligently maintain all servers as equal as possible While this is still possible, we found that too often if we wanted to include a change to the object, we couldn’t do it because with N systems using an object, we usually found N different ways an object was used. Then you would have abnormal procedures spread around the enterprise with names that included a database that might be on a single server (and over time, didn’t even exist anymore). As time passed, we stopped using master and started adding a utility database which was managed on every server. Ideally they were identical, but over time, the DBA and or Developers used the database for “temp” usages that again becomes unwieldy and decidedly not temporary.
What seems to be the best answer is to create a schema on each database that contains the tools that that database actually uses. By putting the objects in each database individually, it is easier to decide whether additions to an object should be a new object that is specific to the single database or if it was simply a new version of an existing object. If it is just a new version, the systems that were using an object would take the new change as an upgrade after due testing. In either case, each database becomes more stand-alone so that moving from server A to server B means a lot less preparation on server B trying to meet the needs of server B. A few SQL Agent jobs may need to be created to execute.
For examples, I chose a few of the types of utilities that I find useful in my databases I design. I will present one example in each, though this is only just a start. We will look at solutions in:
Monitoring Objects
Keeping an eye on the database usage is a very common need for the database administrator. A very typical question from upper management is to find out how much a system is used, and particularly how much the database has grown over time. With a little bit of planning it is easy to be prepared for these questions and others by doing a bit of monitoring.
In the example, I will build a table that will capture the row counts from all of the tables in the database (other than the sys schema), but running a stored procedure. It is set up to use the sys.partitions catalog view, because it gives a “good enough” count of the rows in all of the tables. If it is important to get extremely precise rowcounts, you could use a cursor and do a select COUNT(*) from each table in the database as well.
For monitoring data, I will create a Monitor schema. This I will not give rights to anyone by default, as the monitor schema will generally be for the DBA to get a feel for system growth.
CREATE SCHEMA Monitor;
Next I will create a table, with the grain of the data being at a daily level. The procedure will be created to allow you to capture rowcounts more than once a day if needed. Note too that the ObjectType column can have more than just tables, since it might be interesting to see if indexed views are also growing in size as well. I will include only clustered or heap structures so that we get only the table structures.
CREATE TABLE Monitor.TableRowCount
(
SchemaName sysname NOT NULL,
TableName sysname NOT NULL,
CaptureDate date NOT NULL,
Rows integer NOT NULL,
ObjectType sysname NOT NULL,
Constraint PKMonitor_TableRowCount PRIMARY KEY (SchemaName, TableName, CaptureDate)
);
Then the following procedure will be scheduled to run daily:
CREATE PROCEDURE Monitor.TableRowCount$captureRowcounts
AS
-- ----------------------------------------------------------------
-- Monitor the row counts of all tables in the database on a daily basis
-- Error handling not included for example clarity
--
-- NOTE: This code expects the Monitor.TableRowCount to be in the same db as the
-- tables being monitored. Rework would be needed if this is not a possibility
--
-- 2012 Louis Davidson – [email protected] – drsql.org
-- ----------------------------------------------------------------
-- The CTE is used to set upthe set of rows to put into the Monitor.TableRowCount table
WITH CurrentRowcount AS (
SELECTOBJECT_SCHEMA_NAME(partitions.object_id) AS SchemaName,
OBJECT_NAME(partitions.object_id) AS TableName,
CAST(getdate() AS date) AS CaptureDate,
SUM(rows) AS Rows,
objects.type_desc AS ObjectType
FROMsys.partitions
JOIN sys.objects
ON partitions.object_id = objects.object_id
WHEREindex_id in (0,1) --Heap 0 or Clustered 1 “indexes”
ANDobject_schema_name(partitions.object_id) NOT IN ('sys')
--the GROUP BY handles partitioned tables with > 1 partition
GROUP BY partitions.object_id, objects.type_desc)
--MERGE allows this procedure to be run > 1 a day without concern, it will update if the row
--for the day exists
MERGEMonitor.TableRowCount
USING(SELECT SchemaName, TableName, CaptureDate, Rows, ObjectType
FROM CurrentRowcount) AS Source
ON (Source.SchemaName = TableRowCount.SchemaName
AND Source.TableName = TableRowCount.TableName
AND Source.CaptureDate = TableRowCount.CaptureDate)
WHEN MATCHED THEN
UPDATE SET Rows = Source.Rows
WHEN NOT MATCHED THEN
INSERT (SchemaName, TableName, CaptureDate, Rows, ObjectType)
VALUES (Source.SchemaName, Source.TableName, Source.CaptureDate,
Source.Rows, Source.ObjectType);
GO
Now, you execute the following procedure and check the results for the HumanResources schema in AdventureWorks2012, where we have been working:
EXEC Monitor.TableRowCount$captureRowcounts;
SELECT *
FROM Monitor.TableRowCount
WHERE SchemaName = 'HumanResources'
ORDER BY SchemaName, TableName;
Then (assuming you are still in the pristine version of the AddIn database we are working on), the output of this batch will be as follows:
SchemaName | TableName | CaptureDate | Rows | ObjectType |
----------------- | ----------------- | ----------------- | ----------------- | ----------------- |
HumanResources | Department | 2012-02-25 | 16 | USER_TABLE |
HumanResources | Employee | 2012-02-25 | 290 | USER_TABLE |
HumanResources | EmployeeDepartmentHistory | 2012-02-25 | 296 | USER_TABLE |
HumanResources | EmployeePayHistory | 2012-02-25 | 316 | USER_TABLE |
HumanResources | JobCandidate | 2012-02-25 | 13 | USER_TABLE |
HumanResources | Shift | 2012-02-25 | 3 | USER_TABLE |
If you look at all of the rows in the table for the Monitor schema, you will see that they were 0, since it was checked before we added these rows. Run it again and you will notice that there is an increase of rows in the Monitor.TableRowCount table, notably the rows we just added. I tend to capture the rowcount of all tables in the Monitor and Tools database as well. In many cases, I will then add a procedure to check for abnormal growth of rows in a table. For example, if the calendar table changes rows (up or down), there could easily be an issue, since this table will generally grow once, at the end of the year. You might also write a query to make sure that the monitoring table rows are increasing and alert the admin if not.
In a high number of the systems I work with, data is constantly being moved around, sometimes a very large amount. I almost always make sure that there are relationships, check constraints, unique constraints, etc. This is a way to make sure that the data that is loaded meets the needed quality standards that the user demands.
However, constraints can really slow down the loading of data so quite often the loading program (like SSIS) disables your constraints to make it load the data faster. Unfortunately, it doesn’t re-enable the constraints after it is done loading the data. So I created the following procedure to re-enable the constraints in some or all of the tables in the database. I will put the procedure in a Utility schema and restrict access to only sysadmin users (even going so far as to use a DENY permission for non-sysadmin access).
CREATE SCHEMA Utility;
GO
CREATE PROCEDURE Utility.Constraints$ResetEnableAndTrustedStatus
(
@table_name sysname = '%',
@table_schema sysname = '%',
@doForeignKeyFlag bit = 1,
@doCheckFlag bit = 1
) as
-- ----------------------------------------------------------------
-- Enables disabled foreign key and check constraints, and sets
-- trusted status so optimizer can use them
--
-- NOTE: This code expects the Monitor.TableRowCount to be in the same db as the
-- tables being monitored. Rework would be needed if this is not a possibility
--
-- 2012 Louis Davidson – [email protected] – drsql.org
-- ----------------------------------------------------------------
BEGIN
SET NOCOUNT ON;
DECLARE @statements cursor; --use to loop through constraints to execute one
--constraint for individual DDL calls
SET @statements = cursor for
WITH FKandCHK AS (SELECT OBJECT_SCHEMA_NAME(parent_object_id) AS schemaName,
OBJECT_NAME(parent_object_id) AS tableName,
NAME AS constraintName, Type_desc AS constraintType,
is_disabled AS DisabledFlag,
(is_not_trusted + 1) % 2 AS TrustedFlag
FROM sys.foreign_keys
UNION ALL
SELECT OBJECT_SCHEMA_NAME(parent_object_id) AS schemaName,
OBJECT_NAME(parent_object_id) AS tableName,
NAME AS constraintName, Type_desc AS constraintType,
is_disabled AS DisabledFlag,
(is_not_trusted + 1) % 2 AS TrustedFlag
FROMsys.check_constraints )
SELECT schemaName, tableName, constraintName, constraintType,
DisabledFlag, TrustedFlag
FROM FKandCHK
WHERE (TrustedFlag = 0 OR DisabledFlag = 1)
AND ((constraintType = 'FOREIGN_KEY_CONSTRAINT' AND @doForeignKeyFlag = 1)
OR (constraintType = 'CHECK_CONSTRAINT' AND @doCheckFlag = 1))
AND schemaName LIKE @table_Schema
AND tableName LIKE @table_Name;
OPEN @statements;
DECLARE @statement varchar(1000), @schemaName sysname,
@tableName sysname, @constraintName sysname,
@constraintType sysname,@disabledFlag bit, @trustedFlag bit;
WHILE 1=1
BEGIN
FETCH FROM @statements INTO @schemaName, @tableName, @constraintName,
@constraintType, @disabledFlag, @trustedFlag;
IF @@FETCH_STATUS <> 0
BREAK;
BEGIN TRY -- will output an error if it occurs but will keep on going
--so other constraints will be adjusted
IF @constraintType = 'CHECK_CONSTRAINT'
SELECT @statement = 'ALTER TABLE ' + @schemaName + '.'
+ @tableName + ' WITH CHECK CHECK CONSTRAINT '
+ @constraintName;
ELSE IF @constraintType = 'FOREIGN_KEY_CONSTRAINT'
SELECT @statement = 'ALTER TABLE ' + @schemaName + '.'
+ @tableName + ' WITH CHECK CHECK CONSTRAINT '
+ @constraintName;
EXEC (@statement);
END TRY
BEGIN CATCH --output statement that was executed along with the error number
select 'Error occurred: ' + cast(error_number() as varchar(10))+ ':' +
error_message() + char(13) + char(10) + 'Statement executed: ' +
@statement;
END CATCH
END;
END;
GO
I have several more of these available in the downloads of my web site (drsql.org) to manage (and mostly drop) all types of objects in bulk (constraints, indexes, etc) for when you need to remove the constraints from a table, often in an attempt to update the structure. I keep a model database/structure snapshot of how the database should look, and then I can remove anything I need to and simply add it back using a comparison tool.
In many systems, you will find you need to watch the activities that are occurring in the database. Back in Chapter 7 we implemented an audit trail using triggers, and we chose between using the source table schema or some form of utility schema. In this section, the types of logging we will want to do are a more generic form of logging that is more for the DBAs to see what errors have been occurring.
As an example, one thing that we often may want to log is errors. A common goal these days is to make sure that no errors occur at the database level. Logging any errors that occur to a table can help to see where you have recurring issues. Probably the most interesting way this has ever helped me in my systems was once when a programmer had simply ignored all return values from SQL calls. So a large number of calls to the system were failing, but the client never realized it. (In Appendix B, I will employ this functionality in the trigger templates that I provide).
The Utility.ErrorLog$insert procedure is used to log the errors that occur in a table, to give you a history of errors that have occurred. I do this because, in almost every case, an error that occurs in a trigger is a bad thing. The fact that the client sends data that might cause the trigger to fail should be fixed and treated as a bug. In stored procedures, this may or may not be the case, as stored procedures can be written to do things that may work, or may fail in some situations. This is a very broad statement, and in some cases may not be true, so you can adjust the code as fits your desires.
The DML for the table is as follows:
CREATE TABLE Utility.ErrorLog(
ErrorLogId int NOT NULL IDENTITY CONSTRAINT PKErrorLog PRIMARY KEY,
Number int NOT NULL,
Location sysname NOT NULL,
Message varchar(4000) NOT NULL,
LogTime datetime2(3) NULL
CONSTRAINT DFLTErrorLog_error_date DEFAULT (SYSDATETIME()),
ServerPrincipal sysname NOT NULL
--use original_login to capture the user name of the actual user
--not a user they have impersonated
CONSTRAINT DFLTErrorLog_error_user_name DEFAULT (ORIGINAL_LOGIN())
);
Then we create the following procedure, which can be coded into other procedures and trigger whenever you need to log that an error occurred:
CREATE PROCEDURE Utility.ErrorLog$Insert
(
@ERROR_NUMBER int,
@ERROR_LOCATION sysname,
@ERROR_MESSAGE varchar(4000)
) AS
-- ----------------------------------------------------------------
-- Writes a row to the error log. If an error occurs in the call (such as a NULL value)
-- It writes a row to the error table. If that call fails an error will be returned
--
-- 2012 Louis Davidson – [email protected] – drsql.org
-- ----------------------------------------------------------------
BEGIN
SET NOCOUNT ON;
BEGIN TRY
INSERT INTO Utility.ErrorLog(Number, Location, Message)
SELECT @ERROR_NUMBER,COALESCE(@ERROR_LOCATION,'No Object'),@ERROR_MESSAGE;
END TRY
BEGIN CATCH
INSERT INTO Utility.ErrorLog(Number, Location, Message)
VALUES (-100, 'Utility.ErrorLog$insert',
'An invalid call was made to the error log procedure ' +
ERROR_MESSAGE());
END CATCH
END;
Then we test the error handler with a simple test case, as follows:
--test the error block we will use
BEGIN TRY
THROW 50000,'Test error',16;
END TRY
BEGIN CATCH
IF @@trancount > 0
ROLLBACK TRANSACTION;
--[Error logging section]
DECLARE @ERROR_NUMBER int = ERROR_NUMBER(),
@ERROR_PROCEDURE sysname = ERROR_PROCEDURE(),
@ERROR_MESSAGE varchar(4000) = ERROR_MESSAGE();
EXEC Utility.ErrorLog$Insert @ERROR_NUMBER,@ERROR_PROCEDURE,@ERROR_MESSAGE;
THROW; --will halt the batch or be caught by the caller's catch block
END CATCH
This returns the error we threw.
Msg 50000, Level 16, State 16, Line 3
Test error
And checking the ErrorLog, you can see that the error is logged. (Say that three times fast. I dare you.)
SELECT *
FROM Utility.ErrorLog;
This returns the following:
ErrorLogId | Number | Location | Message | LogTime | ServerPrincipal |
----------------- | ----------------- | ----------------- | ----------------- | ----------------- | ----------------- |
50000 | No Object | Test error | 2012-02-01 | 22:18:25.615 | DENALI-PCAlienDrsql |
This basic error logging procedure can make it much easier to understand what has gone wrong when a user has an error. Expand your own system to meet your organization’s needs, but having an audit trail will prove invaluable when you find out that certain types of errors have been going on for weeks and your users “assumed” you knew about it!
The only real downside to logging in this manner is transactions. You can log all you want, but if the log procedure is called in a transaction, and that transaction is rolled back, the log row will also be rolled back. To write to a log that isn’t affected by transactions, you can use the xp_logevent extended stored procedure in the error handler to write to the Windows Event Log. Using this method can be handy if you have deeply nested errors, in which all the dbo.ErrorLog rows get rolled back due to external transactions.
Other Possibilities…
In the end, it is really going to be up to your database’s particular needs to determine exactly what you may need to put into every (or certainly greater than one) database in your organization. In practice, I tend to see some utilities positioned in a database that are used on the server, for backups, generic index maintenance, and other sorts of generic maintenance.
In my practice, anytime the application and/or any database code references code or data from within the database, everything gets created in the database with the other code. Some other types of concepts that I have seen added to the database to enhance the application are:
As we said, the sky is the limit; use your imagination as to how you can build up your code base in a way that is reusable and also will enhance your application’s needs.
Summary
In most chapters we end with a section covering best practices, but in this chapter there really is only one primary best practice: keep your code encapsulated within the bounds of each database. Even if you use the same code in every database and it is the same now, it may not remain that way forever. The biggest lessons we should have learned from our procedural programmer cousins is that trying to manage > 1 user of the same code leads to a excessively warm condition where the gatekeeper wears red pajamas and carries a pitchfork since every user needs to be updated simultaneously, or you end up with bunches of slightly different copies of code and no real idea who is using what.
The main thrust of the chapter was to demonstrate a number of interesting tools you can use to improve your databases for the convenience of the programmers and DBAs, as well as a number of ideas of how you might start to expand your own database implementations, including:
The goal is no and should always be to make the database its own universe. With the new contained database features in SQL Server 2012, and SQL Azure gaining traction, the database container is going to be closing up tighter than your grandmother’s Tupperware casserole keeper. And even if it weren’t, the more of the functionality that ends up in the database, the easier it will be to test that everything works, and that a change in one system will have no effect on the other. And that should mean fewer questions at 3:00 a.m. about why something failed because another database wasn’t available when another needed it. That can’t be a bad thing, can it?
1 Ivars Peterson, “Taxicab Numbers,” www.sciencenews.org/view/generic/id/2948/title/Math_Trek__Taxicab_Numbers .
18.119.106.237