CHAPTER 12

image

Reusable Standard Database Components

The purpose—where I start—is the idea of use. It is not recycling, it’s reuse.

—Issey Miyake, Japanese fashion designer and fragrance connoisseur

As we near the end of the database design process, the database is pretty much completed from a design standpoint. We have spent time looking at performance, concurrency, and security patterns that you can follow to help implement the database in a manner that will work well under most any typical OLTP-style load. In this chapter (and again somewhat in the next), we are going to look at applying “finishing touches” to the database that can be used to enhance the user experience and assist with the querying and maintaining of the database. In Chapter 9, I introduced the concept of a contained database to help maintain a secure and portable database, and here I will present some additional add-in capabilities and expansion point ideas to add to make your database that much more usable.

The reality of database design is that most databases are rarely complete cookie-cutter affairs. Most companies, even when they buy a third-party package to implement some part of their business, are going to end up making (in many cases substantial) customizations to the database to fit their needs. However, most projects will follow a common pattern that will look like something that has been done before. It isn’t plagiarism to start from some initial design, and it definitely isn’t plagiarism to use common objects that have been used by thousands of other people. If you are starting a very large project, you may want to look at previous models or perhaps even prebuilt “universal data models,” such as those in Len Silverston’s series of books, the first of which is The Data Model Resource Book: A Library of Universal Data Models for All Enterprises (Wiley, 2001) (perhaps the only book on database design with a longer title than the book you hold in your hands right now). Karen Lopez (@datachick on Twitter) frequently speaks on the subject of universal models in the PASS universe that I am often involved with. Even these universal models may only be useful as starting points to help you map from your “reality” to a common view of a given sort of business.

In this chapter, however, I want to explore a few database constructs that I find to be useful and almost always the same for every database I create (where I have complete control, obviously!). Not every database will contain all of what I will present, but when I need a common feature I will use the exact same code in every database, with the obvious caveat that I am constantly looking for new ways to improve almost everything I use over time (not to mention it gives me something to write about when I run out of Lego sets to build). Hence, sometimes a database may use an older version of a feature until it can be upgraded. I will cover the following topics:

  • Numbers table: A table of numbers, usually integers that can be used for a number of interesting uses (not many of them mathematics-related).
  • Calendar table: A table where every row represents a day, assisting in grouping data for queries, both for reports and operational usage.
  • Utility objects: Utilities to monitor usage of the system; extended DDL to support operations that aren’t part of the base DDL in T-SQL. Every programmer has utility objects that they use to make their job easier.
  • Logging objects: Utilities to log the actions of users in the database, generally for system management reasons. A common use is an error log to capture when and where errors are occurring.
  • Other possibilities: I will present a list of additional ideas for ways to extend your databases in ways that will give you independent databases that have common implementation.

Not every database can look alike—even two that do almost the exact same thing will rarely be all that alike unless the designer is the same—but by following the patterns of implementation I have discussed throughout the book thus far and the practices I will discuss in this chapter, you can produce databases that are sufficiently similar to enable the people supporting your work to easily figure out what you had in mind.

If you are dealing with a third-party system that forbids adding any of your own objects, even in a schema that is separated from the shipped schemas, don’t think that everything I am saying here doesn’t apply to you. All of the example code presented supposes a single-database approach, but if you cannot follow that approach, another common approach is to create a companion database where you locate code you need to access their code from the database tier. You would need to slightly rework some examples presented in this chapter to use that approach, but that rework would be minimal.

For the examples in this chapter, I am going to use a copy of the new SQL Server 2016 WideWorldImporters database (which we have used in previous chapters already) to stick to the supposition of the chapter that you should place the tables in the database with the data you are working with. If you are working with a community version of WideWorldImporters that you cannot modify, you can build your own companion database for the examples. The examples are easily ported back to AdventureWorks if you are using it with an earlier version of SQL Server as well. I will include a comment in queries to note where the data is specifically from that database. In cases where cross-database access will not be trivial, I will note that in the code with a comment.

Image Note  The code in this chapter will all be presented using on-disk tables. The objects from the first two sections will have in-memory OLTP versions in the download.

Numbers Table

A numbers table is a precalculated table of some number sequence, with the most typical being non-negative integers. The name “numbers” is pretty open ended, but getting so specific as nonNegativeIntegers is going to subject you to ridicule by the other programmers on the playground. In previous editions of the book I have used the name sequence, but with the addition of the sequence object in SQL Server 2012, the name “numbers” was the next best thing. We will use the numbers table when we need to work with data in an ordered manner, particularly a given sequence of numbers. For example, if you needed a list of the top ten products sold and you only sold six, you would have to somehow manufacture four more rows for display. Having a table where you can easily output a sequence of numbers is going to be a very valuable asset at times indeed.

While you can make the numbers table contain any numbers you may need, usually it is just a simple table of non-negative integers from 0 to some reasonable limit, where reasonable is more or less how many you find you need. I generally load mine by default up to 99999 (99999 gives you full five digits and is a very convenient number for the query I will use to load the table). With the algorithm I will present, you can easily expand to create a sequence of numbers that is larger than you can store in SQL Server.

There are two really beautiful things behind this concept. First, the table of non-negative integers has some great uses dealing with text data, as well as doing all sorts of math with. Second, you can create additional attributes or even other numbers tables that you can use to represent other sets of numbers that you find useful or interesting. For example:

  • Even or odd, prime, squares, cubes, and so on
  • Other ranges or even other grains of values, for example, (-1, -.5, 0, .5, 1)

In the examples in this section, we will look at several techniques you may find useful, and possibly quite often. The following code to generate a simple numbers table of integers is pretty simple, though it looks a bit daunting the first time you see it. It is quite fast to execute in this form, but no matter how fast it may seem, it is not going to be faster than querying from a table that has the sequence of numbers precalculated and stored ahead of time (not to mention if you want to add attributes that are not easily calculated).

;WITH digits (I) AS
            (--set up a set of numbers from 0-9
              SELECT I
              FROM  (VALUES (0),(1),(2),(3),(4),
                            (5),(6),(7),(8),(9)) AS digits (I))
,integers (I) AS (
        SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I)
              -- + (10000*D5.I) + (100000*D6.I)
        FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3
                CROSS JOIN digits AS D4
              --CROSS JOIN digits AS D5 CROSS JOIN digits AS D6
                )
SELECT I
FROM   integers
ORDER  BY I;

This code will return a set of 10,000 rows (with 10,000 rows), as follows:

I
-----------
0
1
2
...
9998
9999

Uncommenting the code for the D5 and D6 tables will give you an order of magnitude increase for each, up to 999,999 rows. The code itself is pretty interesting. Breaking the code down, you get the following:

;WITH digits (I) AS
            (--set up a set of numbers from 0-9
              SELECT I
              FROM  (VALUES (0),(1),(2),(3),(4),
                            (5),(6),(7),(8),(9)) AS digits (I))

This is just simply a set of ten rows from 0 to 9. The next bit is where the true brilliance begins. (No, I am not claiming I came up with this. I first saw it on Erland Sommarskog’s web site a long time ago, using a technique I will show you in a few pages to split a comma-delimited string.) You cross-join the first set over and over, multiplying each level by a greater power of 10. The result is that you get one permutation for each number. For example, since 0 is in each set, you get one permutation that results in 0. You can see this better in the following smallish set:

;WITH digits (I) AS (--set up a set of numbers from 0-9
        SELECT i
        FROM   (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I))
SELECT D1.I AS D1I, 10*D2.I AS D2I, D1.I + (10*D2.I) AS [Sum]
FROM digits AS D1 CROSS JOIN digits AS D2
ORDER BY [Sum];

This returns the following, and you can see that by multiplying the D2.I value by 10, you get the ten’s place repeated, giving you a very powerful mechanism for building a large set. (If you added D3, it would be 100*D3.I.) In the full query, each of the additional digit table references have another power of ten in the SELECT clause multiplier, allowing you to create a very large set (rows removed and replaced with for clarity and to save a tree).

D1I         D2I          Sum
----------- ------------ -----------
0           0            0
1           0            1
2           0            2
3           0            3
4           0            4
5           0            5
6           0            6
7           0            7
8           0            8
9           0            9
0           10           10
1           10           11
2           10           12
3           10           13
4           10           14
...         
6           80           86
7           80           87
8           80           88
9           80           89
0           90           90
1           90           91
2           90           92
3           90           93
4           90           94
5           90           95
6           90           96
7           90           97
8           90           98
9           90           99

This kind of combination of sets is a very useful technique in relational coding. Using the full query, you can create a sequence of numbers that you can use in a query. So, initially create a simple table named Number with a single column I (I because it is a typical value used in math to denote an index in a sequence, such as x(I), where the I denotes a sequence of values of x). The primary purpose of the Number table is to make sure you get all values in a sequence, without having to loop value by value. Create this table in a schema named Tools to contain the types of tool objects, functions, and procedures you will build in this chapter. In all likelihood, this is a schema you would grant EXECUTE and SELECT to public and make the tools available to any user you have given ad hoc query access to.

USE WideWorldImporters;
GO
CREATE SCHEMA Tools;
GO
CREATE TABLE Tools.Number
(
    I   int CONSTRAINT PKNumber PRIMARY KEY
);

Then load it with integers from 0 to 99999:

;WITH digits (I) AS (--set up a set of numbers from 0-9
        SELECT I
        FROM   (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I))
--builds a table from 0 to 99999
,Integers (I) AS (
        SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I) + (10000*D5.I)
               --+ (100000*D6.I)
        FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3
                CROSS JOIN digits AS D4 CROSS JOIN digits AS D5
                /* CROSS JOIN digits AS D6 */)
INSERT INTO Tools.Number(I)
SELECT I
FROM   Integers;

So if you wanted to count the integers between 1 and 1000 (inclusive), it is as simple as

SELECT COUNT(*)
FROM   Tools.Number
WHERE  I between 1 and 1000;

Of course, that would be a bit simplistic, and there had better be 1000 values between (inclusive) 1 and 1000, but what if you wanted the number of integers between 1 and 1000 that are divisible by 9 or 7?

SELECT COUNT(*)
FROM   Tools.Number
WHERE  I BETWEEN 1 AND 1000
  AND  (I % 9 = 0 OR I % 7 = 0);

This returns the obvious answer: 238. Sure, a math nerd could sit down and write a formula to do this, but why? And if you find you need just these values quite often, you could create a table called Tools.DivisibleByNineAndSevenNumber, or add columns to the Number table called DivisibleByNineFlag and DivisibleBySevenFlag if it were needed in context of integers that were not divisible by 9 or 7. The simple numbers table is the most typical need, but you can make a table of any sorts of numbers that you need (prime numbers? squares? cubes?). There is an example in the downloads of an esoteric example of what you can do with a table of numbers to do some pretty (nerdy) fun stuff, but for OLTP use, the goal will be (as we discussed in Chapter 5 on normalization) to precalculate values only when they are used often and can never change. Numbers-type tables are an excellent candidate for storing precalculated values because the sets of integer numbers and prime numbers are the same now as back in 300 BC when Euclid was working with them.

In this section, I will present the following uses of the numbers table to get you going:

  • Determining the contents of a string: Looking through a string without looping by using the ordered nature of the numbers table to manage the iteration using relational code.
  • Determining gaps in a sequence: Having a set of data that contains all of the values allows you to use relational subtraction to find missing values.
  • Separating comma-delimited items: Sometimes data is not broken down into scalar values like you desire.
  • Stupid mathematic tricks: In the downloads, there will be a supplemental file where I take the numbers table to abstract levels, solving a fairly complex math problem that, while not terribly applicable in a practical manner, serves as an experiment to build upon if you have similar, complex problem-solving needs.

Determining the Contents of a String

As a fairly common example usage, it sometimes occurs that a value in a string you are dealing with is giving your code fits, but it isn’t easy to find what the issue is. If you want to look at the Unicode (or ASCII) value for every character in a string, you could write some looping code, or you can do something like the following:

DECLARE @string varchar(20) = ’Hello nurse!’;
SELECT Number.I as Position,
       SUBSTRING(split.value,Number.I,1) AS [Character],
       UNICODE(SUBSTRING(split.value,Number.I,1)) AS [Unicode]
FROM   Tools.Number
         CROSS JOIN (SELECT @string AS value) AS split
WHERE  Number.I > 0 --No zeroth position
  AND  Number.I <= LEN(@string)
ORDER BY Position;

This returns the following:

Position    Character Unicode
----------- --------- -----------
1           H         72
2           e         101
3           l         108
4           l         108
5           o         111
6                     32
7           n         110
8           u         117
9           r         114
10          s         115
11          e         101
12          !         33

This in and of itself is interesting, and sometimes when you execute this, you might see a little square character that can’t be displayed and a really large/odd Unicode value (like 20012, picking one randomly) that you didn’t expect in your database of English-only words. What really makes the technique awesome is that not only didn’t we have to write a routine to go column by column, we won’t have to do so to go row by row either. Using a simple join, you can easily do this for a large number of rows at once, this time joining to a table in the WideWorldImporters database that can provide us with an easy example set:

SELECT People.FullName, Number.I AS position,
              SUBSTRING(People.FullName,Number.I,1) AS [char],
              UNICODE(SUBSTRING(People.FullName, Number.I,1)) AS [Unicode]
FROM   /*WideWorldImporters.*/ Application.People
         JOIN Tools.Number
               ON Number.I <= LEN(People.FullName )
                   AND  UNICODE(SUBSTRING(People.FullName, Number.I,1)) IS NOT NULL
ORDER  BY FullName;

This returns 15128 rows (one for each character in a full name) in < 1 second on a virtual machine hosted on my writing tablet (which admittedly is a beefy tablet: Surface Pro 4; 16GB; 256GB SSD):

FullName                position    char Unicode
----------------------- ----------- ---- -----------
Aahlada Thota           1           A    65
Aahlada Thota           2           a    97
Aahlada Thota           3           h    104
Aahlada Thota           4           l    108
Aahlada Thota           5           a    97
Aahlada Thota           6           d    100
Aahlada Thota           7           a    97
Aahlada Thota           8                32
.......                 ...         .    ...
Aakarsha Nookala        1           A    65
Aakarsha Nookala        2           a    97
Aakarsha Nookala        3           k    107
Aakarsha Nookala        4           a    97
Aakarsha Nookala        5           r    114
Aakarsha Nookala        6           s    115
Aakarsha Nookala        7           h    104
Aakarsha Nookala        8           a    97
.......                 ...         .    ...

With that set, you could easily start eliminating known safe Unicode values with a simple WHERE clause and find your evil outlier that is causing some issue with some process. For example, you could find all names that include a character not in the normal A–Z, space, comma, or dash characters:

SELECT  People.FullName, Number.I AS Position, return                  SUBSTRING(People.FullName,Number.I,1) AS [Char],
              UNICODE(SUBSTRING(People.FullName, Number.I,1)) AS [Unicode]
FROM   /*WideWorldImporters.*/ Application.People
         JOIN Tools.Number
               ON Number.I <= LEN(People.FullName )
                   AND  UNICODE(SUBSTRING(People.FullName, Number.I,1)) IS NOT NULL
WHERE  SUBSTRING(People.FullName, Number.I,1) NOT LIKE ’[a-zA-Z ~’’~-]’ ESCAPE ’~’
ORDER  BY FullName;

This returns the following:

FullName                                           Position    Char Unicode
-------------------------------------------------- ----------- ---- -----------
Abhoy PrabhupÄda                                  15               129
Bahadır Korkmaz                                   7           ±   177
Bimla PrabhupÄda                                  15               129
Deviprasad PrabhupÄda                             20               129
Himadri PrabhupÄda                                17               129
Ivica LuÄic                                       10               141
Malay PrabhupÄda                                  15               129
Sevim Aydın                                      11           ±   177
Taner Yılmaz                                      9           ±   177
Tereza PinÄakova                                  12               143
VÄ›ra Kopecka                                      3           ›  8250
VÄ›ra Stejskalova                                  3           ›  8250
Vicente Chávez                                   12           ¡   161

This can be a remarkably powerful tool when trying to figure out what data is hurting your application with some unsupported text, particularly when dealing with a stream of data from an outside source.

Finding Gaps in a Sequence of Numbers

A common issue when using a column that is supposed to have sequential values is that there can be gaps in values. For most cases, like a surrogate key, this should not be an issue, but when troubleshooting errors it is often useful to be able to determine the missing numbers in a range (perhaps to figure out when you lost some identity values to errors). For example, say you have a table with a domain of values between 1 and 10. How might you determine if a value isn’t used? This is fairly simple; you can just do a distinct query on the used values and then check to see what values aren’t used, right? Well, how about if you had to find missing values in 20,000+ distinct values? This is not quite going to work if a lot of values aren’t used. For example, consider the Person table in the WideWorldImporters database. Running the following query, you can see that not every PersonId is used:

SELECT  MIN(PersonId) AS MinValue, MAX(PersonId) AS MaxValue,
        MAX(PersonId) - MIN(PersonId) + 1 AS ExpectedNumberOfRows,
        COUNT(*) AS NumberOfRows,
        MAX(PersonId) - COUNT(*) AS MissingRows
FROM    /*WideWorldImporters.*/ Application.People;

This returns the following:

MinValue    MaxValue    ExpectedNumberOfRows NumberOfRows MissingRows
----------- ----------- -------------------- ------------ -----------
1           3261        3261                 1111         2150

So you know that there are 1111 rows “missing” between BusinessEntityID values 1 and 3261 (not that there is anything wrong with that in this case, not at all). To discover these rows, take a set of values from 1 to 3261 with no gaps, and subtract the rows using the EXCEPT relational operator:

SELECT Number.I
FROM   Tools.Number
WHERE  I BETWEEN 1 AND 3261
EXCEPT
SELECT PersonId
FROM    /* WideWorldImporters.*/ Application.People;

Execute this query and you will find that 2150 rows are returned, the 2150 surrogate key values that are missing. If this table needed sequential numbers (like if this was not a surrogate key value), you would have the values you needed to fill in. Using the subtraction method with the Numbers table is a very powerful method that you can use in lots of situations where you need to find what isn’t there rather than what is.

Separating Comma-Delimited Items

My last example that you can translate to a direct business need (particularly if you are not yet using SQL Server 2016) comes from Erland Sommarskog’s web site (www.sommarskog.se) on arrays in SQL Server, as well as Aaron Bertrand’s old ASPFAQ web site. Using this code, you can take a comma-delimited list to return it as a table of values (which is the most desirable form for data in SQL Server in case you have just started reading this book on this very page and haven’t learned about normalization yet).

For our example, consider the string ’1,2,3’. Sometimes we need to break this apart (perhaps for a parameter or, as we will see later in the section, to fix some normalization issue). In SQL Server 2016, there is a new STRING_SPLIT function, so you can say:

SELECT *
FROM   STRING_SPLIT(’1,2,3’,’,’);

and the output will be:

value
--------------------
1
2
3

As wonderful as this is (and it is pretty wonderful), the following technique may still find its way into your code at times, especially if you cannot upgrade to SQL Server 2016 even before the next version of this book comes out!

DECLARE @delimitedList VARCHAR(100) = ’1,2,3’
SELECT SUBSTRING(’,’ + @delimitedList + ’,’,I + 1,
          CHARINDEX(’,’,’,’ + @delimitedList + ’,’,I + 1) - I - 1) AS value
FROM Tools.Number
WHERE I >= 1
  AND I < LEN(’,’ + @delimitedList + ’,’) - 1
  AND SUBSTRING(’,’ + @delimitedList + ’,’, I, 1) = ’,’
ORDER BY I;

This output remains the same. The way this code works is pretty interesting in and of itself, and understanding it can help your understanding of SQL. It works by doing a substring on each row. The key is in the WHERE clause.

WHERE I >= 1
  AND I < LEN(’,’ + @delimitedList + ’,’) - 1
  AND SUBSTRING(’,’ + @delimitedList + ’,’, i, 1) = ’,’

The first line is there because SUBSTRING starts with position 1. The second line limits the rows in Tools.Number to more than the length of the @delimitedList variable. The third line includes rows only where the SUBSTRING of the value at the position returns the delimiter, in this case, a comma. So, take the following query:

DECLARE @delimitedList VARCHAR(100) = ’1,2,3’;
SELECT I
FROM Tools.Number
WHERE I >= 1
  AND I < LEN(’,’ + @delimitedList + ’,’) - 1
  AND SUBSTRING(’,’ + @delimitedList + ’,’, I, 1) = ’,’
ORDER BY I;

Executing this, you will see the following results:

I
--------------------
1
3
5

Since the list has a comma added to the beginning and end of it in the query, you will see that that the positions of the commas are represented in the list. The SUBSTRING in the SELECT clause of the main query simply fetches all of the @delimitedList values up to the next comma. This sort of use of the numbers table will allow you to do what at first seems like it would require a painful, iterating algorithm in order to touch each position in the string individually (which is fairly slow in T-SQL, and a bit less so in CLR code) and does it all in one statement in a set-based manner that is actually very fast.

Finally, I have expanded on this technique to allow you to do this for every row in a table that needs it by joining the Tools.Numbers table and joining on the values between 1 and the length of the string (and delimiters). The best use for this code is to normalize a set of data where some programmer thought it was a good idea to store data in a comma-delimited list (it rarely is) so that you can use proper relational techniques to work with this data. It is actually pretty sad how often you may find this query useful.

CREATE TABLE dbo.poorDesign
(
    poorDesignId    int,
    badValue        varchar(20)
);
INSERT INTO dbo.poorDesign
VALUES (1,’1,3,56,7,3,6’),
       (2,’22,3’),
       (3,’1’);

The code just takes the stuff in the WHERE clause of the previous query and moves it into JOIN criteria:

SELECT poorDesign.poorDesignId AS betterDesignId,
       SUBSTRING(’,’ + poorDesign.badValue + ’,’,I + 1,
               CHARINDEX(’,’,’,’ + poorDesign.badValue + ’,’, I + 1) - I - 1)
                                       AS betterScalarValue
FROM   dbo.poorDesign
         JOIN Tools.Number
            ON I >= 1
              AND I < LEN(’,’ + poorDesign.badValue + ’,’) - 1
              AND SUBSTRING(’,’ + + poorDesign.badValue  + ’,’, I, 1) = ’,’;

This returns the following:

betterDesignId betterScalarValue
-------------- ----------------------
1              1
1              3
1              56
1              7
1              3
1              6
2              22
2              3
3              1

Ah…that’s much better. Using STRING_SPLIT() we can do it this way:

SELECT poorDesign.poorDesignId AS betterDesignId, stringSplit.value AS betterScalarValue
FROM   dbo.poorDesign
          CROSS APPLY STRING_SPLIT(badValue,’,’) AS stringSplit;

In both outputs, each row of the output represents only a single value, not an array of values. As I have said many times throughout the book, SQL works great with atomic values, but try to get individual values out of a single column, and you get ugly code like I have just presented. It is an excellent solution for the problem; in fact, it is the fault of the problem that makes it ugly. Now just create the table using this better design of having one row per scalar value, insert the data, and drop the badly designed table.

One thing that is good about having both methods is that if you have a real scenario with millions of rows of this dreck, you could try both. The numbers table version might allow you more leeway to do more tuning, and perhaps complex splitting in any case.

I won’t create the better design, but we do need to clean up the poorDesign table with the following, lest someone stumble upon it and use it as a good idea:

DROP TABLE dbo.poorDesign;

Calendar Table

A common task that people want to know how to do is perform groupings and calculations with date values. For example, you might want sales grouped by month, week, year, or any other grouping. You can usually do this using the SQL Server date functions, but often it is costly in performance, or at least pretty messy or impossible if you want something more than a simple calendar year and month grouping. What can truly help with this process is to use a table filled with date values, commonly called a calendar table. Using a calendar table is commonplace in business intelligence/OLAP implementations (something that is covered more in Chapter 14), but it certainly can be useful in OLTP databases when you get stuck doing a confusing date range query.

Using the same form of precalculated logic that we applied to the numbers table, we can create a table that contains one row per date. I will set the date as the primary key and then have data related to the date as columns. The following is the basic date table that I currently use. You can extend it as you want to include working days, holidays, special events, and so on, to filter/group by in the same manner as you do with these columns, along with the others I will show you how to add later in the section (again, I am working in a copy of WideWorldImporters, bolting on my tools to make life easier).

CREATE TABLE Tools.Calendar
(
        DateValue date NOT NULL CONSTRAINT PKtools_calendar PRIMARY KEY,
        DayName varchar(10) NOT NULL,
        MonthName varchar(10) NOT NULL,
        Year varchar(60) NOT NULL,
        Day tinyint NOT NULL,
        DayOfTheYear smallint NOT NULL,
        Month smallint NOT NULL,
        Quarter tinyint NOT NULL
);

Image Note  I wanted to make several of these columns into persisted computed columns, but it turns out that the datename and datepart functions we will use to load the data were not deterministic functions due to how they have to work with regionalization, so we will store the values manually. In SQL Server 2012+, there is an additional way to format date data using FORMAT (also nondeterministic), which allows for specification of regionalization information. I didn’t change my example to use FORMAT because DATEPART and DATENAME are perfectly valid and a bit more readable methods of extracting date information from a date value, but FORMAT would allow you to reference multiple cultures in a single statement if you needed it.

The question of normalization might come up again at this point, and it is a valid question. Since these values we have created can be calculated (and quite easily in most case), why do this? Isn’t this denormalization? There are a few ways to look at it, but I would generally say that it isn’t. Each row represents a day in time, and each of the columns is functionally dependent on the date value. The fact is, the functions have to look up or calculate the date attributes in some manner, so you are actually saving that lookup cost, however minimal it might be. When you have to filter on date criteria, the benefit of the normalized design can be quite obvious. Later in the section we will extend the table to add more interesting, company-based values, which certainly are normalized in the context of a date.

Note that the name calendar is not truly consistent with our names representing a single row, at least not in a natural way, but the other names that we might use (date, dateValue, etc.) all sound either forced or hokey. Since the table represents a calendar and a calendar item generally would logically represent a day, I went with it. A stretch perhaps, but naming is pretty difficult at times, especially when working in a namespace in which the real world and SQL Server have so many established names.

The next step is to load the table with values, which is pretty much a straightforward task using the Numbers table that we just finished creating in the previous section. Using the datename and datepart functions and a few simple case expressions, you load the different values. I will make use of many of the functions in the examples, but most are very easy to understand.

WITH dates (newDateValue) AS (
        SELECT DATEADD(day,I,’17530101’) AS newDateValue
        FROM Tools.Number
)
INSERT Tools.Calendar
        (DateValue ,DayName
        ,MonthName ,Year ,Day
        ,DayOfTheYear ,Month ,Quarter
)
SELECT
        dates.newDateValue as DateValue,
        DATENAME (dw,dates.newDateValue) As DayName,
        DATENAME (mm,dates.newDateValue) AS MonthName,
        DATENAME (yy,dates.newDateValue) AS Year,
        DATEPART(day,dates.newDateValue) AS Day,
        DATEPART(dy,dates.newDateValue) AS DayOfTheYear,
        DATEPART(m,dates.newDateValue) AS Month,
        DATEPART(qq,dates.newDateValue) AS Quarter
FROM    dates
WHERE   dates.newDateValue BETWEEN ’20000101’ AND ’20200101’ --set the date range
ORDER   BY DateValue;

Just like the numbers table, there are several commonly useful ways to use the calendar table. The first I will demonstrate is general grouping types of queries, and the second is calculating ranges. As an example of grouping, say you want to know how many sales had been made during each year in the Sales.Order table in the WideWorldImporters database. This is why there is a Year column in the table:

SELECT Calendar.Year, COUNT(*) AS OrderCount
FROM   /*WideWorldImporters.*/ Sales.Orders
         JOIN Tools.Calendar
               --note, the cast here could be a real performance killer
               --consider using date columns where possible
            ON Orders.OrderDate = Calendar.DateValue --OrderDate is a date type column
GROUP BY Calendar.Year
ORDER BY Calendar.Year;

This returns the following:

Year             OrderCount
---------------- -----------
2013             19450
2014             21199
2015             23329
2016             9617

The beauty of the calendar table is that you can easily group values that are not that simple to compute in a really obvious manner them, all while using a natural relational coding style/technique. For example, to count the sales on Tuesdays and Thursdays:

SELECT Calendar.DayName, COUNT(*) as OrderCount
FROM   /*WideWorldImporters.*/ Sales.Orders
         JOIN Tools.Calendar
               --note, the cast here could be a real performance killer
               --consider using date columns where possible
            ON CAST(Orders.OrderDate as date) = Calendar.DateValue
WHERE DayName IN (’Tuesday’,’Thursday’)
GROUP BY Calendar.DayName
ORDER BY Calendar.DayName;

This returns the following:

DayName    OrderCount
---------- -----------
Thursday   13421
Tuesday    13737

Image Tip  In many tables where the value is a datetime or datetime2 and I need to use just the date often, I will add a computed column that has the date only.

OK, I see you are possibly still skeptical. What if I throw in the first Tuesday after the second Wednesday? How? Well, it is really a simple matter of building the set step by step using CTEs to build the set that you need:

;WITH onlyWednesdays AS --get all Wednesdays
(
    SELECT *,
           ROW_NUMBER()  OVER (PARTITION BY Calendar.Year, Calendar.Month
                               ORDER BY Calendar.Day) AS wedRowNbr
    FROM   Tools.Calendar
    WHERE  DayName = ’Wednesday’
),
secondWednesdays AS --limit to second Wednesdays of the month
(
    SELECT *
    FROM   onlyWednesdays
    WHERE  wedRowNbr = 2
)
,finallyTuesdays AS --finally limit to the Tuesdays after the second wed
(
    SELECT Calendar.*,
           ROW_NUMBER() OVER (PARTITION BY Calendar.Year, Calendar.Month
                              ORDER by Calendar.Day) AS rowNbr
    FROM   secondWednesdays
             JOIN Tools.Calendar
                ON secondWednesdays.Year = Calendar.Year
                    AND secondWednesdays.Month = Calendar.Month
    WHERE  Calendar.DayName = ’Tuesday’
      AND  Calendar.Day > secondWednesdays.Day
)
--and in the final query, just get the one month
SELECT Year, MonthName, Day
FROM   finallyTuesdays
WHERE  Year = 2016
  AND  rowNbr = 1;

This returns the following set of values that appears to be the result of some advanced calculations, but rather is the results of a fairly simple query (to anyone in Chapter 12 of this book, for sure!):

Year              MonthName  Day
----------------- ---------- ----
2016              January    19
2016              February   16
2016              March      15
2016              April      19
2016              May        17
2016              June       14
2016              July       19
2016              August     16
2016              September  20
2016              October    18
2016              November   15
2016              December   20

Now, utilizing another CTE, you could use this to join to the Sales.Order table and find out sales on the first Tuesday after the second Wednesday. And that is exactly what I would do if I was being fed requirements by the mad hatter in charge of marketing (I think they prefer the term marketing analyst) and would be done in no time. However, if this date became a corporate standard-desired date (and no other jobs were available for you), I would add a column to the calendar table (maybe called FirstTuesdayAfterSecondWednesdayFlag) and set it to 1 for every date that it matches (and in the one month where that day was a holiday and they wanted it to be the Thursday after the second Wednesday after…well, the change would be a simple update).

The realistic application of this is a company sale that they routinely start on a usually given day every month, so the report needs to know how sales were for four days after. So, perhaps the column would be BigSaleDayFlag, or whatever works. The goal is the same as back in Chapter 7 when we talked about data-driven design. Store data in columns; rules in code. A new type of day is just a new column and calculation that every user now has access to and doesn’t have to know the calculation that was used to create the value.

In most standard calendar table utilizations, it will be common to have, at a minimum, the following generic events/time ranges:

--Saturday or Sunday set to 1, else 0
WeekendFlag bit NOT NULL,
FiscalYear smallint NOT NULL,
FiscalMonth tinyint NULL,
FiscalQuarter tinyint NOT NULL

Almost every company has a fiscal year that it uses for its business calendar. This technique allows you to treat the fiscal time periods more or less like regular calendar dates in your code, with no modification at all. I will demonstrate this after we load the data in just a few paragraphs.

This covers the basics; now I’ll discuss one last thing you can add to the calendar table to solve the problem that really shifts this into “must-have” territory: floating windows of time.

Image Note  This is actually one of the few techniques that I created on my own. I am not actually claiming that I am the only person to ever do this, but I have not read about it anywhere else, and I built this myself when trying to solve a particular type of problem using the system functions to no avail.

The goal for the floating windows of time is to add a relative positioning value to the table so that a user could easily get N time periods from a point in time. Years are already contiguous, increasing numbers, so it is easy to do math with them. But it is not particularly comfortable to do math with months. I found myself often having to get the past 12 months of activity, sometimes including the current month and sometimes not. Doing math that wraps around a 12-month calendar was a pain, so to the calendar I added the following columns that I load with increasing values with no gaps:

RelativeDayCount int NOT NULL,
RelativeWeekCount int NOT NULL,
RelativeMonthCount int NOT NULL

Using these columns, I can store sequence numbers that start at an arbitrary point in time. I will use ’20000101’ here, but it is really unimportant, and negative values are not a problem either. You should never refer to the value itself, just the value’s relative position to some point in time you choose. And days are numbered from that point (negative before, positive before), months, and again weeks. This returns the following “final” calendar table:

DROP TABLE Tools.Calendar;
GO
CREATE TABLE Tools.Calendar
(
        DateValue date NOT NULL CONSTRAINT PKtools_calendar PRIMARY KEY,
        DayName varchar(10) NOT NULL,
        MonthName varchar(10) NOT NULL,
        Year varchar(60) NOT NULL,
        Day tinyint NOT NULL,
        DayOfTheYear smallint NOT NULL,
        Month smallint NOT NULL,
        Quarter tinyint NOT NULL,
        WeekendFlag bit NOT NULL,
        --start of fiscal year configurable in the load process, currently
        --only supports fiscal months that match the calendar months.
        FiscalYear smallint NOT NULL,
        FiscalMonth tinyint NULL,
        FiscalQuarter tinyint NOT NULL,
        --used to give relative positioning, such as the previous 10 months
        --which can be annoying due to month boundaries
        RelativeDayCount int NOT NULL,
        RelativeWeekCount int NOT NULL,
        RelativeMonthCount int NOT NULL
);

Last, I will reload the table with the following code:

;WITH dates (newDateValue) AS (
        SELECT DATEADD(day,I,’17530101’) AS newDateValue
        FROM Tools.Number
)
INSERT Tools.Calendar
        (DateValue ,DayName
        ,MonthName ,Year ,Day
        ,DayOfTheYear ,Month ,Quarter
        ,WeekendFlag ,FiscalYear ,FiscalMonth
        ,FiscalQuarter ,RelativeDayCount,RelativeWeekCount
        ,RelativeMonthCount)
SELECT
        dates.newDateValue AS DateValue,
        DATENAME (dw,dates.newDateValue) AS DayName,
        DATENAME (mm,dates.newDateValue) AS MonthName,
        DATENAME (yy,dates.newDateValue) AS Year,
        DATEPART(day,dates.newDateValue) AS Day,
        DATEPART(dy,dates.newDateValue) AS DayOfTheYear,
        DATEPART(m,dates.newDateValue) AS Month,
        CASE
                WHEN MONTH( dates.newDateValue) <= 3 THEN 1
                WHEN MONTH( dates.newDateValue) <= 6 THEN 2
                When MONTH( dates.newDateValue) <= 9 THEN 3
        ELSE 4 END AS Quarter,
        CASE WHEN DATENAME (dw,dates.newDateValue) IN (’Saturday’,’Sunday’)
                THEN 1
                ELSE 0
        END AS WeekendFlag,
        --------------------------------------------------------------
        --the next three blocks assume a fiscal year starting in July.
        --change if your fiscal periods are different
        --------------------------------------------------------------
        CASE
                WHEN MONTH(dates.newDateValue) <= 6
                THEN YEAR(dates.newDateValue)
                ELSE YEAR (dates.newDateValue) + 1
        END AS FiscalYear,
        CASE
                WHEN MONTH(dates.newDateValue) <= 6
                THEN MONTH(dates.newDateValue) + 6
                ELSE MONTH(dates.newDateValue) - 6
         END AS FiscalMonth,
        CASE
                WHEN MONTH(dates.newDateValue) <= 3 then 3
                WHEN MONTH(dates.newDateValue) <= 6 then 4
                WHEN MONTH(dates.newDateValue) <= 9 then 1
        ELSE 2 END AS FiscalQuarter,
        ------------------------------------------------
        --end of fiscal quarter = july
        ------------------------------------------------
        --these values can be anything, as long as they
        --provide contiguous values on year, month, and week boundaries
        DATEDIFF(day,’20000101’,dates.newDateValue) AS RelativeDayCount,
        DATEDIFF(week,’20000101’,dates.newDateValue) AS RelativeWeekCount,
        DATEDIFF(month,’20000101’,dates.newDateValue) AS RelativeMonthCount
FROM    dates
WHERE  dates.newDateValue BETWEEN ’20000101’ AND ’20200101’; --set the date range

Now I can build a query to get only weekends, grouped by FiscalYear as follows:

SELECT Calendar.FiscalYear, COUNT(*) AS OrderCount
FROM   /*Adventureworks2012.*/ Sales.SalesOrderHeader
         JOIN Tools.Calendar
               --note, the cast here could be a real performance killer
               --consider using a persisted calculated column here
            ON CAST(SalesOrderHeader.OrderDate as date) = Calendar.DateValue
WHERE    WeekendFlag = 1
GROUP BY Calendar.FiscalYear
ORDER BY Calendar.FiscalYear;

This returns the following:

FiscalYear OrderCount
---------- -----------
2013       950
2014       1723
2015       2109
2016       1782

To demonstrate the floating windows of time using the Relative____Count columns, consider that you want to count the sales for the previous two weeks. It’s not impossible to do this using the date functions perhaps, but it’s simple to do with a calendar table:

DECLARE @interestingDate date = ’20140509’;
SELECT Calendar.DateValue as PreviousTwoWeeks, CurrentDate.DateValue AS Today,
        Calendar.RelativeWeekCount
FROM   Tools.Calendar
           JOIN (SELECT *
                 FROM Tools.Calendar
                 WHERE DateValue = @interestingDate) AS CurrentDate
              ON  Calendar.RelativeWeekCount < (CurrentDate.RelativeWeekCount)
                  and Calendar.RelativeWeekCount >=
                                         (CurrentDate.RelativeWeekCount -2);

This returns the following:

PreviousTwoWeeks Today      RelativeWeekCount
---------------- ---------- -----------------
2014-04-20       2014-05-09 747
2014-04-21       2014-05-09 747
2014-04-22       2014-05-09 747
2014-04-23       2014-05-09 747
2014-04-24       2014-05-09 747
2014-04-25       2014-05-09 747
2014-04-26       2014-05-09 747
2014-04-27       2014-05-09 748
2014-04-28       2014-05-09 748
2014-04-29       2014-05-09 748
2014-04-30       2014-05-09 748
2014-05-01       2014-05-09 748
2014-05-02       2014-05-09 748
2014-05-03       2014-05-09 748

From this, we can see that the previous two weeks start on Sunday (04/20/2014 and 04/27/2014) and end on Saturday (04/26/2014 and 05/03/2014). The dates 05/04/2012 to 05/09/2012 are not included because that is this current week (relative to the variable value). The basics of the query is simply to take a derived table that fetches the calendar row for the “interesting” date and then join that to the full calendar table using a range of dates in the join. In the previous week’s example, I used the following:

Calendar.RelativeWeekCount < (CurrentDate.RelativeWeekCount)
   AND Calendar.RelativeWeekCount >= (CurrentDate.RelativeWeekCount -2)

This was because I wanted the weeks that are previous to the current week and weeks two weeks back. Now, join the values to your sales table, and you can see sales for the past two weeks. Want to see it broken down by week? Use relative week count. Use the following if you wanted the previous 12 months:

DECLARE @interestingDate date = ’20140509’;
SELECT MIN(Calendar.DateValue) AS MinDate, MAX(Calendar.DateValue) AS MaxDate
FROM   Tools.Calendar
           JOIN (SELECT *
                 FROM Tools.Calendar
                 WHERE DateValue = @interestingDate) AS CurrentDate
              ON  Calendar.RelativeMonthCount < (CurrentDate.RelativeMonthCount)
                  AND Calendar.RelativeMonthCount >=
                                       (CurrentDate.RelativeMonthCount -12);

This returns the following:

MinDate    MaxDate
---------- ----------
2013-05-01 2014-04-30

Now you can use these dates in other criteria, either by assigning these values to a variable or (if you are one of the cool kids) by using the tables in a join to other tables. As a real example, let’s hop in the WABAC machine and look at sales in the WorldWideImporters database around September 27, 2008. (Keep in mind that the sample databases could change in the future to get more up-to-date data.)

DECLARE @interestingDate date = ’20140509’
SELECT Calendar.Year, Calendar.Month, COUNT(*) AS OrderCount
FROM   /*WorldWideImporters.*/ Sales.Orders
         JOIN Tools.Calendar
           JOIN (SELECT *
                 FROM Tools.Calendar
                 WHERE DateValue = @interestingDate) as CurrentDate
                   ON  Calendar.RelativeMonthCount <=
                                           (CurrentDate.RelativeMonthCount)
                    AND Calendar.RelativeMonthCount >=
                                           (CurrentDate.RelativeMonthCount -10)
            ON Orders.ExpectedDeliveryDate = Calendar.DateValue
GROUP BY Calendar.Year, Calendar.Month
ORDER BY Calendar.Year, Calendar.Month;

This query will give you items that shipped in the previous ten months and in May, and group them by month, as follows:

Year            Month  OrderCount
--------------- ------ -----------
2013           7      1969
2013           8      1502
2013           9      1664
2013           10     1617
2013           11     1531
2013           12     1545
2014           1      1774
2014           2      1595
2014           3      1561
2014           4      1742
2014           5      1867

I included the current month by changing the first condition to <=. So, you should be able to see that the calendar table and sequence table are two excellent tables to add to almost any database. They give you the ability to take a functional problem like getting the last day of the month or the previous time periods and turn it into a relational question that SQL Server can chew up and spit out using the relational techniques it is built for.

Keep in mind too that you can easily create more fiscal calendars, reporting calendars, corporate holiday calendars, sales calendars, and so forth by adding more columns to this very same calendar table. For instance, you may have multiple business units, each with its own fiscal calendar. With the one normal calendar, you can add any variation of the fiscal calendar multiple times. It’s easy to just add those columns, since they all relate back to a normal calendar date.

One last thing to note is that the calendar table needs to be maintained. The INSERT statement we wrote earlier can be changed slightly so that it can be run to extend the number of rows as time passes. Perhaps an Agent job ran on the first day of the year to add another set of rows, or I have also used a process to keep the calendar just a few weeks in the future for a payroll system. The rules changed on occasion, so the new rows were added just a few weeks in advance based on whatever rules were then in effect.

Utility Objects

As you acquire more and more experience over the years (which is loosely synonymous with the many failures you will encounter), you will undoubtedly end up with a toolbox full of various tools that you find you use quite frequently. Some of these tools will be used in an ad hoc manner when you are dealing with the same problems over and over again. I personally like to simply keep a set of files/projects stored on my dropbox/skydrive folders for easy accessibility when I am working, writing, or just poking around at SQL Server to see what I can see at the office, home, or even at a client site.

Some tools, however, go beyond the type of utilities that are used infrequently. For these types of tools the logistics of their use can pose a problem. How do you make sure that they are usable by every database, as needed, and allow for the differences that will occur in every system (and keep your databases portable so that you can move them to a new server with the least amount of pain?). In the old days of the 20th century, we would make a system object in the master database that was prefixed with sp_ (and marked it as a system object) and diligently maintain all servers as equally as possible While this is still possible, we found that too often if we wanted to include a change to the object, we couldn’t do it because with N systems using an object, we usually found N different ways an object was used. Then we would have abnormal procedures spread around the enterprise with names that included a database that might be on a single server (and over time, didn’t even exist anymore). As time passed, we stopped using master and started adding a utility database that was managed on every server. Ideally, they were identical, but over time, the DBAs and or developers used the database for “temp” usages that again became unwieldy and decidedly not temporary.

What seems to be the best answer is to create a schema on each database that contains the tools that that database actually uses. By putting the objects in each database individually, it is easier to decide whether additions to an object should be a new object that is specific to the single database or simply a new version of an existing object. If it is just a new version, the systems that were using an object would take the new change as an upgrade after due testing. In either case, each database becomes more stand-alone so that moving from server A to server B means a lot less preparation on server B trying to meet the needs of server B. A few SQL Agent jobs may need to be created to execute, but not multiple tables and such.

For examples, I chose a few of the types of utilities that I find useful in my databases I design. I will present one example in each, though this is only just a start. We will look at solutions in

  • Monitoring tools: Tools to watch what is going on with the database, such as row counts, file sizes, and so forth.
  • Extended DDL utilities: Tools used to make changes to the structure of the database, usually to remove keys or indexes for load processes, usually to do multiple DDL calls where SQL Server’s DDL would require multiple calls.

Monitoring Objects

Keeping an eye on the database usage is a very common need for the database administrator. A very typical question from upper management is to find out how much a system is used, and particularly how much the database has grown over time. With a little bit of planning it is easy to be prepared for these questions and others by doing a bit of monitoring.

In the example, I will build a table that will capture the row counts from all of the tables in the database (other than the sys schema), by running a stored procedure. It is set up to use the sys.partitions catalog view, because it gives a “good enough” count of the rows in all of the tables. If it is important to get extremely precise rowcounts, you could use a cursor and do a SELECT COUNT(*) from each table in the database as well. In my real systems we have this sort of object, as well as many based on the dynamic management views (DMVs) that capture statistics about the database daily, or sometimes hourly.

For monitoring data, I will create a Monitor schema. To this I will not give rights to anyone other than the db_owner users by default, as the Monitor schema will generally be for the DBA to get a feel for system growth, not for general usage.

CREATE SCHEMA Monitor;

Next I will create a table, with the grain of the data being at a daily level. The procedure will be created to allow you to capture rowcounts more than once a day if needed. Note too that the ObjectType column can have more than just tables, since it might be interesting to see if indexed views are also growing in size as well. I will include only clustered or heap structures so that we get only the base structures.

CREATE TABLE Monitor.TableRowCount
(
        SchemaName  sysname NOT NULL,
        TableName   sysname NOT NULL,
        CaptureDate date    NOT NULL,
        Rows        int NOT NULL, --proper name, rowcount is reserved
        ObjectType  sysname NOT NULL,
        Constraint PKTableRowCount PRIMARY KEY (SchemaName, TableName, CaptureDate)
);

Then the following procedure will be scheduled to run daily:

CREATE PROCEDURE Monitor.TableRowCount$captureRowcounts
AS
-- ----------------------------------------------------------------
-- Monitor the row counts of all tables in the database on a daily basis
-- Error handling not included for example clarity
--
-- NOTE: This code expects the Monitor.TableRowCount to be in the same db as the
--       tables being monitored. Rework would be needed if this is not a possibility
--
-- 2016 Louis Davidson – [email protected] – drsql.org
-- ----------------------------------------------------------------
-- The CTE is used to set up the set of rows to put into the Monitor.TableRowCount table
WITH CurrentRowcount AS (
SELECT OBJECT_SCHEMA_NAME(partitions.object_id) AS SchemaName,
       OBJECT_NAME(partitions.object_id) AS TableName,
       CAST(getdate() AS date) AS CaptureDate,
       SUM(rows) AS Rows,
       objects.type_desc AS ObjectType
FROM   sys.partitions
          JOIN sys.objects
               ON partitions.object_id = objects.object_id
WHERE  index_id in (0,1) --Heap 0 or Clustered 1 "indexes"
AND    object_schema_name(partitions.object_id) NOT IN (’sys’)
--the GROUP BY handles partitioned tables with > 1 partition
GROUP BY partitions.object_id, objects.type_desc)
--MERGE allows this procedure to be run > 1 a day without concern, it will update if the row
--for the day exists
MERGE  Monitor.TableRowCount
USING  (SELECT SchemaName, TableName, CaptureDate, Rows, ObjectType
        FROM CurrentRowcount) AS Source
               ON (Source.SchemaName = TableRowCount.SchemaName
                   AND Source.TableName = TableRowCount.TableName
                   AND Source.CaptureDate = TableRowCount.CaptureDate)
WHEN MATCHED THEN  
        UPDATE SET Rows = Source.Rows
WHEN NOT MATCHED THEN
        INSERT (SchemaName, TableName, CaptureDate, Rows, ObjectType)
        VALUES (Source.SchemaName, Source.TableName, Source.CaptureDate,
                Source.Rows, Source.ObjectType);
GO

Now, you execute the following procedure and check the results for the HumanResources schema in WideWorldImporters, where we have been working:

EXEC Monitor.TableRowCount$CaptureRowcounts;
SELECT *
FROM   Monitor.TableRowCount
WHERE  SchemaName = ’Purchasing’
ORDER BY SchemaName, TableName;

Then (assuming you are still in the pristine version of the WideWorldImporters database we are working on), the output of this batch will be as follows:

SchemaName         TableName                      CaptureDate Rows        ObjectType
------------------ ------------------------------ ----------- ----------- --------------
Purchasing         PurchaseOrderLines             2016-06-15  8367        USER_TABLE
Purchasing         PurchaseOrders                 2016-06-15  2074        USER_TABLE
Purchasing         SupplierCategories             2016-06-15  9           USER_TABLE
Purchasing         SupplierCategories_Archive     2016-06-15  1           USER_TABLE
Purchasing         Suppliers                      2016-06-15  13          USER_TABLE
Purchasing         Suppliers_Archive              2016-06-15  13          USER_TABLE
Purchasing         SupplierTransactions           2016-06-15  2438        USER_TABLE

If you look at all of the rows in the table for the Monitor schema, you will see that they were 0, since it was checked before we added these rows. Run it again and you will notice that there is an increase of rows in the Monitor.TableRowCount table, notably the rows we just added. I tend to capture the rowcount of all tables in the Monitor and Tools database as well. In many cases, I will then add a procedure to check for abnormal growth of rows in a table. For example, if the calendar table changes rows (up or down), there could easily be an issue, since this table will generally grow once, at the end of each year. You might also write a query to make sure that the monitoring table rows are increasing and alert the admin if not.

Extended DDL Utilities

In a high number of the systems I work with, data is constantly being moved around, sometimes a very large amount. I almost always make sure that there are relationships, check constraints, unique constraints, etc. This is a way to make sure that the data that is loaded meets the needed quality standards that the user demands.

However, constraints can really slow down the loading of data so quite often the loading program (like SSIS) disables your constraints to make it load the data faster. Unfortunately, it doesn’t re-enable the constraints after it is done loading the data. So I created the following procedure to re-enable the constraints in some or all of the tables in the database. I will put the procedure in a Utility schema and restrict access to only sysadmin users (even going so far as to use a DENY permission for non-sysadmin access).

CREATE SCHEMA Utility;
GO
CREATE PROCEDURE Utility.Constraints$ResetEnableAndTrustedStatus
(
    @table_name sysname = ’%’,
    @table_schema sysname = ’%’,
    @doFkFlag bit = 1,
    @doCkFlag bit = 1
) as
-- ----------------------------------------------------------------
-- Enables disabled foreign key and check constraints, and sets
-- trusted status so optimizer can use them
--
-- NOTE: This code expects the Monitor.TableRowCount to be in the same db as the
--       tables being monitored. Rework would be needed if this is not a possibility
--
-- 2016 Louis Davidson – [email protected] – drsql.org
-- ----------------------------------------------------------------
 BEGIN
      SET NOCOUNT ON;
      DECLARE @statements cursor; --use to loop through constraints to execute one
                                 --constraint for individual DDL calls
      SET @statements = CURSOR FOR
           WITH FKandCHK AS (SELECT OBJECT_SCHEMA_NAME(parent_object_id) AS schemaName,                                       
                                    OBJECT_NAME(parent_object_id) AS tableName,
                                    NAME AS constraintName, Type_desc AS constraintType,
                                    is_disabled AS DisabledFlag,
                                    (is_not_trusted + 1) % 2 AS TrustedFlag
                             FROM   sys.foreign_keys
                             UNION ALL
                             SELECT OBJECT_SCHEMA_NAME(parent_object_id) AS schemaName,
                                    OBJECT_NAME(parent_object_id) AS tableName,
                                    NAME AS constraintName, Type_desc AS constraintType,
                                    is_disabled AS DisabledFlag,
                                    (is_not_trusted + 1) % 2 AS TrustedFlag
                             FROM   sys.check_constraints )
           SELECT schemaName, tableName, constraintName, constraintType,
                  DisabledFlag, TrustedFlag
           FROM   FKandCHK
           WHERE  (TrustedFlag = 0 OR DisabledFlag = 1)
             AND  ((constraintType = ’FOREIGN_KEY_CONSTRAINT’ AND @doFkFlag = 1)
                    OR (constraintType = ’CHECK_CONSTRAINT’ AND @doCkFlag = 1))
             AND  schemaName LIKE @table_Schema
             AND  tableName LIKE @table_Name;
      OPEN @statements;
      DECLARE @statement varchar(1000), @schemaName sysname,
              @tableName sysname, @constraintName sysname,
              @constraintType sysname,@disabledFlag bit, @trustedFlag bit;
      WHILE 1=1
         BEGIN
              FETCH FROM @statements INTO @schemaName, @tableName, @constraintName,                 
                                          @constraintType, @disabledFlag, @trustedFlag;
               IF @@FETCH_STATUS <> 0
                    BREAK;
               BEGIN TRY -- will output an error if it occurs but will keep on going
                         -- so other constraints will be adjusted
                 IF @constraintType = ’CHECK_CONSTRAINT’
                            SELECT @statement = ’ALTER TABLE ’ + @schemaName + ’.’
                                            + @tableName + ’ WITH CHECK CHECK CONSTRAINT ’
                                            + @constraintName;
                  ELSE IF @constraintType = ’FOREIGN_KEY_CONSTRAINT’
                            SELECT @statement = ’ALTER TABLE ’ + @schemaName + ’.’
                                            + @tableName + ’ WITH CHECK CHECK CONSTRAINT ’
                                            + @constraintName;
                  EXEC (@statement);                                 
              END TRY
              BEGIN CATCH --output statement that was executed along with the error number
                  SELECT ’Error occurred: ’ + CAST(ERROR_NUMBER() AS varchar(10))+ ’:’ +  
                          error_message() + CHAR(13) + CHAR(10) + ’Statement executed: ’ +  
                          @statement;
              END CATCH
        END;
   END;

I have several more of these available in the downloads of my web site (www.drsql.org) to manage (and mostly drop) all types of objects in bulk (constraints, indexes, etc.) for when you need to remove the constraints from a table, often in an attempt to update the structure. I keep a model database/structure snapshot of how the database should look, and then I can remove anything I need to and simply add it back using a comparison tool.

Logging Objects

In many systems, you will find you need to watch the activities that are occurring in the database. Back in Chapter 9 we implemented an audit trail using the audit feature. In this section, the types of logging we will want to do are a more generic form of logging that is intended more for the DBAs to see what errors have been occurring.

As an example, one thing that we often may want to log is errors. A common goal these days is to make sure that no errors occur at the database level. Logging any errors that occur to a table can help to see where you have recurring issues. Probably the most interesting way this has ever helped me in my systems was once when a programmer had simply ignored all return values from SQL calls. So a large number of calls to the system were failing, but the client never realized it. (In Appendix B, I will employ this functionality in the trigger templates that I provide.)

The ErrorHandling.ErrorLog$Insert procedure is used to log the errors that occur in a table, to give you a history of errors that have occurred. I do this because, in almost every case, an error that occurs in a trigger is a bad thing. The fact that the client sends data that might cause the trigger to fail should be fixed and treated as a bug. In stored procedures, this may or may not be the case, as stored procedures can be written to do things that may work, or may fail in some situations. This is a very broad statement, and in some cases may not be true, so you can adjust the code as fits your desires.

The DML for the table is as follows:

CREATE SCHEMA ErrorHandling;
GO
CREATE TABLE ErrorHandling.ErrorLog(
        ErrorLogId int NOT NULL IDENTITY CONSTRAINT PKErrorLog PRIMARY KEY,
                Number int NOT NULL,
        Location sysname NOT NULL,
        Message varchar(4000) NOT NULL,
        LogTime datetime2(3) NULL
              CONSTRAINT DFLTErrorLog_error_date  DEFAULT (SYSDATETIME()),
        ServerPrincipal sysname NOT NULL
              --use original_login to capture the user name of the actual user
              --not a user they have impersonated
              CONSTRAINT DFLTErrorLog_error_user_name DEFAULT (ORIGINAL_LOGIN())
);

Then we create the following procedure, which can be coded into other procedures and trigger whenever you need to log that an error occurred:

CREATE PROCEDURE ErrorHandling.ErrorLog$Insert
(
        @ERROR_NUMBER int,
        @ERROR_LOCATION sysname,
        @ERROR_MESSAGE varchar(4000)
) AS
-- ------------------------------------------------------------------------------------
-- Writes a row to the error log. If an error occurs in the call (such as a NULL value)
-- It writes a row to the error table. If that call fails an error will be returned
--
-- 2016 Louis Davidson – [email protected] – drsql.org
-- ------------------------------------------------------------------------------------
 BEGIN
        SET NOCOUNT ON;
        BEGIN TRY
           INSERT INTO ErrorHandling.ErrorLog(Number, Location,Message)
           SELECT @ERROR_NUMBER,COALESCE(@ERROR_LOCATION,’No Object’),@ERROR_MESSAGE;
        END TRY
        BEGIN CATCH
           INSERT INTO ErrorHandling.ErrorLog(Number, Location, Message)
           VALUES (-100, ’Utility.ErrorLog$insert’,
                        ’An invalid call was made to the error log procedure ’ +  
                                     ERROR_MESSAGE());
        END CATCH;
END;

Then we test the error handler with a simple test case, as follows:

--test the error block we will use
BEGIN TRY
    THROW 50000,’Test error’,1;
END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
    --[Error logging section]
        DECLARE @ERROR_NUMBER int = ERROR_NUMBER(),
                @ERROR_PROCEDURE sysname = ERROR_PROCEDURE(),
                @ERROR_MESSAGE varchar(4000) = ERROR_MESSAGE();
        EXEC ErrorHandling.ErrorLog$Insert @ERROR_NUMBER,@ERROR_PROCEDURE,@ERROR_MESSAGE;
    THROW; --will halt the batch or be caught by the caller’s catch block
END CATCH;

This returns the error we threw:

Msg 50000, Level 16, State 1, Line 3
Test error

And checking the ErrorLog:

SELECT *
FROM  ErrorHandling.ErrorLog;

we can see that the error is logged:

ErrorLogId  Number  Location    Message       LogTime                   ServerPrincipal
----------- ------- ----------- ------------- ------------------------- ------------------
1           50000   No Object   Test error    2016-06-15 15:31:52.649   SomeUserName

This basic error-logging procedure can make it much easier to understand what has gone wrong when a user has an error (or see when hundreds of errors have occurred, and the code has just TRY...CATCH hidden it from the user). Expand your own system to meet your organization’s needs, but having an audit trail will prove invaluable when you find out that certain types of errors have been going on for weeks and your users “assumed” you knew about it!

The only real downside to logging in this manner is transactions. You can log all you want, but if the log procedure is called in a transaction, and that transaction is rolled back, the log row will also be rolled back. To write to a log that isn’t affected by transactions, you can use the xp_logevent extended stored procedure in the error handler to write to the Windows Event Log. Using this method can be handy if you have deeply nested errors and are not building reasonably simple error-handler stacks, in which all the ErrorHandling.ErrorLog rows get rolled back due to external transactions.

Other Possibilities…

In the end, it is really going to be up to your database’s particular needs to determine exactly what you may need to put into every (or certainly greater than one) database in your organization. In practice, I tend to see some utilities positioned in a database that are used on the server, for backups, generic index maintenance, and other sorts of generic maintenance.

Some other types of concepts that I have seen added to the database to enhance the application are

  • Functions that aren’t implemented in SQL Server: User-defined functions have plenty of pros and cons when used in T-SQL statements (particularly in a WHERE clause) but they definitely have their place to wrap common, especially complex, functionality into a neat package. For example, consider the functionality we covered in the earlier section “Separating Comma-Delimited Items.” We might create a function Tools.String$SplitOnCommaOrSemicolon(@ValueToSplit) that encapsulates the string-splitting function for reuse that uses a couple of delimiters simultaneously. Another common example is changing a time from one time zone (often UTC) to another.
  • Security: Microsoft provides a lot of security-oriented system functions for you to determine who the user is, but sometimes they aren’t enough. For some systems, I have employed a function in a schema named Security to allow for overriding the system context information when needed. The goal is that whenever you need to know what user is making changes to your system, you can call the Security.UserName$get() function and know that it is doing the heavy lifting, either looking to the ORIGINAL_LOGIN() system function or SESSION_CONTEXT, or even using a custom-built security system.
  • Reference/demographic information: It is very common for systems to need the city, state, ZIP/postal code, phone number, and/or country information of customers/clients. Several companies (including the United States Postal Service) publish the domain data that defines the regions and format of this data, but in every case I have seen, they provide the data in arcane formats that are difficult to work with. So we create a demographics database with a schema (we use reference for ours) that we can duplicate in any system that needs this sort of data so we have to load it once from the foreign format, and in the expected format everywhere else it is needed, making it easier to share around the enterprise.

As I said, the sky is the limit; use your imagination as to how you can build up your code base in a way that is reusable and also will enhance your application’s needs.

Summary

Most chapters in this book end with a section covering best practices, but in this chapter there really is only one primary best practice: keep your code encapsulated within the bounds of each database and maintain a library of SQL that you can reuse. Even if you use the same code in every database and it is the same now, it may not remain that way forever. The biggest lessons we should have learned from our procedural programmer cousins is that trying to manage > 1 user of the same code leads to an excessively warm condition where the gatekeeper wears red pajamas and carries a pitchfork since every user needs to be updated simultaneously, or you end up with bunches of slightly different copies of code and no real idea who is using what.

The main thrust of the chapter was to demonstrate a number of interesting tools you can use to improve your databases for the convenience of the programmers and DBAs, as well as a number of ideas of how you might start to expand your own database implementations, including:

  • Numbers table: A table of numbers that you can use to perform actions where a sequence of values might be useful. For example, finding gaps in another sequence of values (like identity values).
  • Calendar table: A table that can help you implement methods of grouping date data using typical relational methods, both for normal date groupings like year, month, etc., and for custom times like corporate fiscal calendars, sales, holidays, etc. While the most common usage might be for data warehouse/reporting systems (which you will see in Chapter 14), having the calendar table available in your OLTP database can be very useful.
  • Utilities: Conceptual utilities to monitor usage of the system and extended DDL to support operations that aren’t part of the base DDL in T-SQL. Every programmer has utilities that they use to make their job easier. Having them accessible in the database means that you can always be sure that the version that the programmer expected to be there will be.
  • Logging actions: Utilities to log the actions of users in the database, generally for system management reasons. A common use is an error log to capture when and where errors are occurring.
  • Any other use that you can dream up to make it easier for you, the programmers, and the DBAs to work with the database on any server, from development, to QA, and production. The more add-ins you can create that can make the experience across database implementations more consistent, the better off you will be once your database is out in the world being used and, of course, maintained for years (likely decades) to come.

The goal is now and should always be to make the database its own universe. With the contained database features in 2012, and SQL Azure gaining traction, the database container is going to be closing up tighter than your grandmother’s Tupperware casserole keeper. And even if it weren’t, the more of the functionality that ends up in the database, the easier it will be to test that everything works, and that a change in one system will have no effect on the other. And that should mean fewer questions at 3:00 AM about why something failed because another database wasn’t available when another needed it. That can’t be a bad thing, can it?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.137.75