CHAPTER 7

image

Expanding Data Protection with Check Constraints and Triggers

Safety is something that happens between your ears, not something you hold in your hands.

—Jeff Cooper, US Marine, Creator of the modern technique of firing a handgun

One of the weirdest things I see in database implementations is that people spend tremendous amounts of time designing the correct database storage (or, at least, what seems like tremendous amounts of time to them) and then just leave the data unprotected with tables being more or less treated like buckets that will accept anything, opting to let code outside of the database layer to do all of the data protection. Honestly, I do understand the allure, in that the more constraints you apply, the harder development is in the early stages of the project, and the programmers honestly do believe that they will catch everything. The problem is, there is rarely a way to be 100% sure that all code written will always enforce every rule.

The second argument against using automatically enforced data protection is that programmers want complete control over the errors they will get back and over what events may occur that can change data. I am for this also, as long as it can be completely trustworthy. When the table itself says no bad data, you can be sure that it contains no bad data (as much as you have designed what “bad data” means). And if the user interface has to duplicate the rules that have been specified in the requirements, that makes perfect sense. The data layer’s error messaging is atrocious, even using a few techniques to map error messages to descriptions. We don’t live in a mainframe, batch-processing world, but there are many things that the data layer can handle perfectly that external code cannot (certainly without holding locks on all of the data in the database, that is).

Perhaps, in an ideal world, you could control all data input carefully, but in reality, the database is designed and then turned over to the programmers and users to “do their thing.” Those pesky users immediately exploit any weakness in your design to meet the requirements that they “thought they gave you in the first place.” No matter how many times I’ve forgotten to apply a UNIQUE constraint in a place where one should be, the data duplications start to occur. Ultimately, user perception is governed by the reliability and integrity of the data that users retrieve from your database. If they detect data anomalies in their data sets (usually in skewed report values), their faith in the whole application plummets faster than a skydiving elephant who packed lunch instead of a parachute.

One of the things I hope you will feel as you read this chapter (and keep the earlier ones in mind) is that, if at all possible, the data storage layer should own protection of the fundamental data integrity. We started in the previous chapter with foreign keys, unique constraints, and a few check constraints. In this chapter, I will take it farther to show some deeper examples of what can be done when the need arises.

One argument that arises regarding the concept of putting code in multiple locations (e.g., checking for a positive value both in the data entry blank and with a check constraint) is that it’s both

  • Bad for performance
  • More work

As C.S. Lewis had one of his evil characters in The Screwtape Letters note, “By mixing a little truth with it they had made their lie far stronger.” The fact of the matter is that these are, in fact, true statements from one perspective, but these two arguments miss the point. The real problem we must solve is that data can come from multiple locations:

  • Users using custom, very well-built front-end tools
  • Users using generic data manipulation tools, such as Microsoft Access
  • Routines that import data from external sources
  • Raw queries executed by data administrators to fix problems caused by user error

Each of these poses different issues for your integrity scheme. What’s most important is that each of these scenarios (with the possible exception of the second) forms part of nearly every database system developed. To best handle each scenario, the data must be safeguarded, using mechanisms that work without the responsibility of the user, even the DBA fixing data who is very careful.

If you decide to implement your data logic in a different tier other than directly in the database, you have to make sure that you implement it—and far more importantly, implement it correctly—in every single one of those clients. If you update the logic, you have to update it in multiple locations anyhow. If a client is “retired” and a new one introduced, the logic must be replicated in that new client. You’re much more susceptible to coding errors if you have to write the code in more than one place. Having your data protected in the single location helps prevent programmers from forgetting to enforce a rule in one situation, even if they remember everywhere else. If you receive an error (like you might if you try to store something illogical like ‘A’ + 20 and expect to get back an integer anyhow), you know that the UI needs to cover a scenario that it didn’t.

What’s more, because of concurrency, every statement is apt to fail due to a deadlock, or a timeout, or the data validated in the UI no longer being in the same state as it was even microseconds ago. In Chapter 11, we will cover concurrency, but suffice it to say that errors arising from issues in concurrency are often exceedingly random in appearance and must be treated as occurring at any time. And concurrency is the final nail in the coffin of using a client tier only for integrity checking. Unless you elaborately lock all users out of the database objects you are using, the state could change and a database error could occur. Are the errors annoying? Yes, they are, but they are the last line of defense between having excellent data integrity and something quite the opposite.

In this chapter, I will present two building blocks of enforcing data integrity in SQL Server, first using declarative objects: check constraints, which allow you to define predicates on new rows in a table, and triggers, which are stored procedure–style objects that can fire after a table’s contents have changed.

In SQL Server 2016, the Native compilation model that works with in-memory objects supports check constraints and triggers. Triggers can also be written using CLR-based objects. In this chapter I will focus only on on-disk model objects. In Chapter 13, when we discuss creating code to access the tables we have created, I will cover some of the differences in code for interpreted and native modules, and the difference is great and figures into what can be coded in triggers and check constraints. Native compilation will improve version to version (and should be expected to make continual improvements in the Azure DB product), but as of 2016, it is still quite limited. In the downloadable Appendix B, I will include a section on writing in-memory triggers.

Check Constraints

Check constraints are part of a class of the declarative data protection options. Basically, constraints are SQL Server devices that are used to enforce data integrity automatically on a single column or row. You should use constraints as extensively as possible to protect your data, because they’re simple and, for the most part, have minimal overhead.

One of the greatest aspects of all of SQL Server’s constraints (other than DEFAULT constraints) is that the query optimizer can use them to optimize queries, because the constraints tell the optimizer about some additional quality aspect of the data. For example, say you place a constraint on a column that requires that all values for that column must fall between 5 and 10. If a query is executed that asks for all rows with a value greater than 100 for that column, the optimizer will know without even looking at the data that no rows meet the criteria.

SQL Server has five kinds of declarative constraints:

  • NULL: Determines if a column will accept NULL for its value. Though NULL constraints aren’t technically named constraints you add on, they are generally considered constraints.
  • PRIMARY KEY and UNIQUE constraints: Used to make sure your rows contain only unique combinations of values over a given set of key columns.
  • FOREIGN KEY: Used to make sure that any migrated keys have only valid values that match the key columns they reference.
  • DEFAULT: Used to set an acceptable default value for a column when the user doesn’t provide one. (Some people don’t count defaults as constraints, because they don’t constrain updates.)
  • CHECK: Used to limit the values that can be entered into a single column or an entire row.

We have covered NULL, PRIMARY KEY, UNIQUE, and DEFAULT constraints in enough detail in Chapter 6; they are pretty straightforward without a lot of variation in the ways you will use them. In this section, I will focus the examples on the various ways to use CHECK constraints to implement data protection patterns for your columns/rows. You use CHECK constraints to disallow improper data from being entered into columns of a table. CHECK constraints are executed after DEFAULT constraints (so you cannot specify a default value that would contradict a CHECK constraint) and INSTEAD OF triggers (covered later in this chapter) but before AFTER triggers. CHECK constraints cannot affect the values being inserted or deleted but are used to verify the validity of the supplied values.

The biggest complaint that is often lodged against constraints is about the horrible error messages you will get back. It is one of my biggest complaints as well, and there is very little you can do about it, although I will posit a solution to the problem later in this chapter. It will behoove you to understand one important thing: all DML (and DDL) statements should have error handling as if the database might give you back an error—because it might.

There are two flavors of CHECK constraint: column and table. Column constraints reference a single column and are used when the individual column is referenced in a modification. CHECK constraints are considered table constraints when more than one column is referenced in the criteria. Fortunately, you don’t have to worry about declaring a constraint as either a column constraint or a table constraint. When SQL Server compiles the constraint, it verifies whether it needs to check more than one column and sets the proper internal values.

We’ll be looking at building CHECK constraints using two methods:

  • Simple expressions
  • Complex expressions using user-defined functions

The two methods are similar, but you can build more complex constraints using functions, though the code in a function can be more complex and difficult to manage. In this section, we’ll take a look at some examples of constraints built using each of these methods; then we’ll take a look at a scheme for dealing with errors from constraints. First, though, let’s set up a simple schema that will form the basis of the examples in this section.

The examples in this section on creating CHECK constraints use the sample tables shown in Figure 7-1.

9781484219720_7_Fig1.jpg

Figure 7-1. The example schema

To create and populate the tables, execute the following code (in the downloads, I include a simple create database for a database named Chapter7 and will put all objects in that database):

CREATE SCHEMA Music;
GO
CREATE TABLE Music.Artist
(
   ArtistId int NOT NULL,
   Name varchar(60) NOT NULL,
   CONSTRAINT PKArtist PRIMARY KEY CLUSTERED (ArtistId),
   CONSTRAINT PKArtist_Name UNIQUE NONCLUSTERED (Name)
);
CREATE TABLE Music.Publisher
(
        PublisherId              int CONSTRAINT PKPublisher PRIMARY KEY,
        Name                     varchar(20),
        CatalogNumberMask        varchar(100)
        CONSTRAINT DFLTPublisher_CatalogNumberMask DEFAULT (’%’),
        CONSTRAINT AKPublisher_Name UNIQUE NONCLUSTERED (Name),
);
CREATE TABLE Music.Album
(
       AlbumId int NOT NULL,
       Name varchar(60) NOT NULL,
       ArtistId int NOT NULL,
       CatalogNumber varchar(20) NOT NULL,
       PublisherId int NOT NULL,
       CONSTRAINT PKAlbum PRIMARY KEY CLUSTERED(AlbumId),
       CONSTRAINT AKAlbum_Name UNIQUE NONCLUSTERED (Name),
       CONSTRAINT FKArtist$records$Music_Album
            FOREIGN KEY (ArtistId) REFERENCES Music.Artist(ArtistId),
       CONSTRAINT FKPublisher$Published$Music_Album
            FOREIGN KEY (PublisherId) REFERENCES Music.Publisher(PublisherId)
);

Then seed the data with the following:

INSERT  INTO Music.Publisher (PublisherId, Name, CatalogNumberMask)
VALUES (1,’Capitol’,
        ’[0-9][0-9][0-9]-[0-9][0-9][0-9a-z][0-9a-z][0-9a-z]-[0-9][0-9]’),
        (2,’MCA’, ’[a-z][a-z][0-9][0-9][0-9][0-9][0-9]’);
INSERT  INTO Music.Artist(ArtistId, Name)
VALUES (1, ’The Beatles’),(2, ’The Who’);
INSERT INTO Music.Album (AlbumId, Name, ArtistId, PublisherId, CatalogNumber)
VALUES (1, ’The White Album’,1,1,’433-43ASD-33’),
       (2, ’Revolver’,1,1,’111-11111-11’),
       (3, ’Quadrophenia’,2,2,’CD12345’);

A likely problem with this design is that it isn’t normalized well enough for a complete solution. Publishers usually have a mask that’s valid at a given point in time, but everything changes. If the publishers lengthen the size of their catalog numbers or change to a new format, what happens to the older data? For a functioning system, it would be valuable to have a release-date column and catalog number mask that is valid for a given range of dates. Of course, if you implemented the table as presented, the enterprising user, to get around the improper design, would create publisher rows such as ’MCA 1989-1990’, ’MCA 1991-1994’, and so on and mess up the data for future reporting needs, because then, you’d have work to do to correlate values from the MCA company (and your table would be not even technically in first normal form!).

As a first example of a check constraint, consider if you had a business rule that no artist with a name that contains the word ’Pet’ followed by the word ’Shop’ is allowed. You could code the rule as follows (note, all examples assume a case-insensitive collation, which is almost certainly the normal):

ALTER TABLE Music.Artist WITH CHECK
   ADD CONSTRAINT CHKArtist$Name$NoPetShopNames
           CHECK (Name NOT LIKE ’%Pet%Shop%’);

Then, test by trying to insert a new row with an offending value:

INSERT INTO Music.Artist(ArtistId, Name)
VALUES (3, ’Pet Shop Boys’);

This returns the following result:

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint "CHKArtist$Name$NoPetShopNames". The conflict occurred in database "Chapter7", table "Music.Artist", column ’Name’.

This keeps my music collection database safe from at least one ’80s band.

When you create a CHECK constraint, the WITH NOCHECK setting (the default is WITH CHECK) gives you the opportunity to add the constraint without checking the existing data in the table.

Let’s add a row for another musician who I don’t necessarily want in my table:

INSERT INTO Music.Artist(ArtistId, Name)
VALUES (3, ’Madonna’);

Later in the process, it is desired that no artists with the word “Madonna” will be added to the database, but if you attempt to add a check constraint

ALTER TABLE Music.Artist WITH CHECK
   ADD CONSTRAINT CHKArtist$Name$noMadonnaNames
           CHECK (Name NOT LIKE ’%Madonna%’);

rather than the happy “Command(s) completed successfully.” message you so desire to see, you see the following:

Msg 547, Level 16, State 0, Line 1
The ALTER TABLE statement conflicted with the CHECK constraint "CHKArtist$Name$noMadonnaNames". The conflict occurred in database "Chapter7", table "Music.Artist", column ’Name’.

Ideally, you will then change the contents of the table such that it will meet the requirements of the constraint. In order to allow the constraint to be added, you might specify the constraint using WITH NOCHECK rather than WITH CHECK because you now want to allow this new constraint, but there’s data in the table that conflicts with the constraint, and it is deemed too costly to fix or clean up the existing data.

ALTER TABLE Music.Artist WITH NOCHECK
   ADD CONSTRAINT CHKArtist$Name$noMadonnaNames
           CHECK (Name NOT LIKE ’%Madonna%’);

The statement is executed to add the check constraint to the table definition, and using NOCHECK means that the invalid value does not affect the creation of the constraint. This is OK in some cases but can be very confusing because any time a modification statement references the column, the CHECK constraint is fired. The next time you try to set the value of the table to the same bad value, an error occurs. In the following statement, I simply set every row of the table to the same name it has stored in it:

UPDATE Music.Artist
SET Name = Name;

This produces the following error message:

Msg 547, Level 16, State 0, Line 1
The UPDATE statement conflicted with the CHECK constraint "CHKArtist$Name$noMadonnaNames". The conflict occurred in database "Chapter7", table "Music.Artist", column ’Name’.

“What?” most users will exclaim (well, unless they are the support person at 3 AM wondering what is going on, in which case they will upgrade it to “WHAT?!?!”). “If the value was in the table, shouldn’t it already be good?” The user is correct. A strategy to deal with changes in format, or allowing older data to meet one criteria while new data fits a different one, can be to include a time range for the values. CHECK Name NOT LIKE ’%Madonna%’ OR RowCreateDate < ’20141131’ could be a reasonable compromise—as long as the users understand what is going on with their queries, naturally.

Using NOCHECK and leaving the values unchecked is almost worse than leaving the constraint off in many ways.

Image Tip  If a data value could be right or wrong, based on external criteria, it is best not to be overzealous in your enforcement. The fact is, unless you can be 100% sure, when you use the data later, you will still need to make sure that the data is correct before usage.

One of the things that makes constraints excellent beyond the obvious data integrity reasons is that if the constraint is built using WITH CHECK, the optimizer can make use of this fact when building plans if the constraint didn’t use any functions and just used simple comparisons such as less than, greater than, and so on. For example, imagine you have a constraint that says that a value must be less than or equal to 10. If, in a query, you look for all values of 11 and greater, the optimizer can use this fact and immediately return zero rows, rather than having to scan the table to see whether any value matches.

If a constraint is built with WITH CHECK, it’s considered trusted, because the optimizer can trust that all values conform to the CHECK constraint. You can determine whether a constraint is trusted by using the sys.check_constraints catalog object:

SELECT definition, is_not_trusted
FROM   sys.check_constraints
WHERE  object_schema_name(object_id) = ’Music’
  AND  name = ’CHKArtist$Name$noMadonnaNames’;

This returns the following results (with some minor formatting, of course):

definition                     is_not_trusted
------------------------------ ---------------
(NOT [Name] like ’%Madonna%’)  1

Make sure, if at all possible, that is_not_trusted = 0 for all rows so that the system trusts all your CHECK constraints and the optimizer can use the information when building plans.

Image Caution  Creating check constraints using the CHECK option (instead of NOCHECK) on a tremendously large table can take a very long time to apply, so often, you’ll feel like you need to cut corners to get it done fast. The problem is that the shortcut on design or implementation often costs far more in later maintenance costs or, even worse, in the user experience. If at all reasonable, it’s best to try to get everything set up properly, so there is no confusion.

To make the constraint trusted, you will need to clean up the data and use ALTER TABLE <tableName> WITH CHECK CHECK CONSTRAINT constraintName to have SQL Server check the constraint and set it to trusted. Of course, this method suffers from the same issues as creating the constraint with NOCHECK in the first place (mostly, it can take forever!). But without checking the data, the constraint will not be trusted, not to mention that forgetting to re-enable the constraint is too easy. For our constraint, we can try to check the values:

  ALTER TABLE Music.Artist WITH CHECK CHECK CONSTRAINT CHKArtist$Name$noMadonnaNames;

And it will return the following error (as it did when we tried to create it the first time):

Msg 547, Level 16, State 0, Line 1
The ALTER TABLE statement conflicted with the CHECK constraint "CHKArtist$Name$noMadonnaNames". The conflict occurred in database "Chapter7", table "Music.Artist", column ’Name’.

But, if we delete the row with the name Madonna

DELETE FROM  Music.Artist
WHERE  Name = ’Madonna’;

and try again, the ALTER TABLE statement will be executed without error, and the constraint will be trusted (and all will be well with the world!). One last thing you can do is to disable a constraint, using NOCHECK:

ALTER TABLE Music.Artist NOCHECK CONSTRAINT CHKArtist$Name$noMadonnaNames;

Now, you can see that the constraint is disabled by adding an additional object property:

SELECT definition, is_not_trusted, is_disabled
FROM   sys.check_constraints
WHERE  object_schema_name(object_id) = ’Music’
  AND  name = ’CHKArtist$Name$noMadonnaNames’;

This will return

definition                     is_not_trusted is_disabled
------------------------------ -------------- -----------
(NOT [Name] like ’%Madonna%’)  1              1

Then, rerun the statement to enable the statement before we continue:

ALTER TABLE Music.Artist WITH CHECK CHECK CONSTRAINT CHKArtist$Name$noMadonnaNames;

After that, checking the output of the sys.check_constraints query, you will see that it has been enabled.

CHECK Constraints Based on Simple Expressions

By far, most CHECK constraints are simple expressions that just test some characteristic of a value in a column or columns. These constraints often don’t reference any data other than the single column but can reference any of the columns in a single row.

As a few examples, consider the following:

  • Empty strings: Prevent users from inserting one or more space characters to avoid any real input into a column, such as CHECK(LEN(ColumnName) > 0). This constraint is on 90% of the character columns in databases I design, to avoid the space character entry that drives you crazy when you don’t expect it.
  • Date range checks: Make sure a reasonable date is entered; for example:
    • The date a rental is required to be returned should be greater than one day after the RentalDate (assume the two columns are implemented with the date datatype): CHECK (ReturnDate > DATEADD(DAY,1,RentalDate)).
    • The date of some event that’s supposed to have occurred already in the past: CHECK(EventDate <= GETDATE()).
  • Value reasonableness: Make sure some value, typically a number of some sort, is reasonable for the situation. “Reasonable,” of course, does not imply that the value is necessarily correct for the given situation, which is usually the domain of the middle tier of objects—just that it is within a reasonable domain of values. For example:
    • Values needing to be a nonnegative integer: CHECK(MilesDriven >= 0). This constraint is commonly needed, because there are often columns where negative values don’t make sense (hours worked, miles driven, and so on), but the intrinsic type will allow it.
    • Royalty rate for an author that’s less than or equal to 30%. If this rate ever could be greater, it isn’t a CHECK constraint. So if 15% is the typical rate, the UI might warn that it isn’t normal, but if 30% is the absolute ceiling, it would be a good CHECK constraint: CHECK (RoyaltyRate <= .3).

CHECK constraints of this variety are always a good idea when you have situations where there are data conditions that must always be true. Another way to put this is that the very definition of the data is being constrained, not just a convention that could change fairly often or even be situationally different. These CHECK constraints are generally extremely fast and won’t negatively affect performance except in extreme situations. As an example, I’ll just show the code for the first, empty string check, because simple CHECK constraints are easy to code once you have the syntax.

To avoid letting a user get away with a blank column value, you can add the following constraint to prevent this from ever happening again (after deleting the two blank rows). For example, in the Album table, the Name column doesn’t allow NULLs. The user has to enter something, but what about when the enterprising user realizes that ’’ is not the same as NULL? What will be the response to an empty string? Ideally, of course, the UI wouldn’t allow such nonsense for a column that had been specified as being required, but the user just hits the space bar, but to make sure, we will want to code a constraint to avoid it.

The constraint simply works by using the LEN function that does a trim by default, eliminating any space characters, and checking the length:

ALTER TABLE Music.Album WITH CHECK
   ADD CONSTRAINT CHKAlbum$Name$noEmptyString
           CHECK (LEN(Name) > 0); --note,len does a trim by default, so any string
                                  --of all space characters will return 0

Testing this with data that will clash with the new constraint

INSERT INTO Music.Album ( AlbumId, Name, ArtistId, PublisherId, CatalogNumber )
VALUES ( 4, ’’, 1, 1,’dummy value’ );

you get the following error message

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint "CHKAlbum$Name$noEmptyString". The conflict occurred in database "Chapter7", table "Music.Album", column ’Name’.

All too often, nonsensical data is entered just to get around your warning, but that is more of a UI or managerial oversight problem than a database design concern, because the check to see whether ’ASDFASDF’ is a reasonable name value is definitely not of the definite true/false variety. (Have you seen what some people name their kids?) What’s generally the case is that the user interface will then prevent such data from being created via the UI, but the CHECK constraint is there to prevent other processes from putting in completely invalid data, no matter what the source of data.

These check constraints are very useful when you are loading data into the table from an outside source. Often, when data is imported from a file, like from the Import Wizard, blank data will be propagated as blank values, and the programmers involved might not think to deal with this condition. The check constraints make sure that the data is put in correctly. And as long as you are certain to go back and recheck the trusted status and values, their existence helps to remind you even if they are ignored, like using SSIS’s bulk loading features. In Figure 7-2, you will see that you can choose to (or choose not to) check constraints on the OLEDB destination output. In this case, it may either disable the constraint or set it to not trusted to speed loading, but it will limit the data integrity and optimizer utilization of the constraint until you reset it to trusted as was demonstrated in the previous section.

9781484219720_7_Fig2.jpg

Figure 7-2. Example SSIS OLEDB output with check constraints deselected

CHECK Constraints Using Functions

Sometimes, you really need to implement a complex data check where a simple Boolean expression using the columns in the table and base T-SQL scalar functions just won’t do. In standard SQL, you can use a subquery in your constraints, but in SQL Server, subqueries are not allowed. However, you can use a scalar T-SQL function, even if it accesses another table.

In general, using functions is a fairly atypical solution to ensuring data integrity, but it can be far more powerful and, in many ways, quite useful when you need to build slightly complex data integrity protection. For the most part, CHECK constraints usually consist of the simple task of checking a stable format or value of a single column, and for these tasks, a standard CHECK constraint using the simple <BooleanExpression> is perfectly adequate.

However, a CHECK constraint need not be so simple. A user-defined function (UDF) can be complex and might touch several tables in the instance. Here are some examples:

  • Complex scalar validations (often using CLR functions): For example, in a situation where a regular expression would be easier to use than a LIKE comparison.
  • Validations that access other tables: For example, to check a domain that is based on values in several tables, rather than a simple foreign key. In the example, I will implement an entry mask that is table based, so it changes based on a related table’s value.

I should warn you that calling a UDF has a great deal of overhead, and while you might get the urge to encapsulate a simple function for use in a CHECK constraint, it almost always isn’t worth the overhead. As we have mentioned, CHECK constraints are executed once per row affected by the DML modification statement, and this extra cost will be compounded for every row affected by the modification query. I realize that this can be counterintuitive to a good programmer thinking that encapsulation is one of the most important goals of programming, but SQL is quite different from other types of programming in many ways because of the fact that you are pushing so much of the work to the engine, and the engine has to take what you are doing and find the best way (which you must respect) of executing the code.

Hence, it’s best to try to express your Boolean expression without a UDF unless it’s entirely necessary to access additional tables or to do something more complex than a simple expression can. In the following examples, I’ll employ UDFs to provide powerful rule checking, which can implement complex rules that would prove difficult to code using a simple Boolean expression.

You can implement the UDFs in either T-SQL or a .NET language (VB .NET , C#, or any .NET language that lets you exploit the capabilities of SQL Server 2005+ to write CLR-based objects in the database). In many cases, especially if you aren’t doing any kind of table access in the code of the function, the CLR will perform much better than the T-SQL version.

As an example, I need to access values in a different table, so I’m going to build an example that implements an entry mask that varies based on the parent of a row. Consider that it’s desirable to validate that catalog numbers for albums are of the proper format. However, different publishers have different catalog number masks for their clients’ albums. (A more natural, yet distinctly more complex example would be phone numbers and addresses from around the world.)

For this example, I will continue to use the tables from the previous section. Note that the mask column, Publisher.CatalogNumberMask, needs to be considerably larger (five times larger in my example code) than the actual CatalogNumber column, because some of the possible masks use multiple characters to indicate a single character. You should also note that it’s a varchar, even though the column is stored as a char value, because using char variables as LIKE masks can be problematic because of the space padding at the end of such columns (the comparison thinks that the extra space characters that are padded on the end of the fixed-length string need to match in the target string, which is rarely what’s desired).

To do this, I build a T-SQL function that accesses this column to check that the value matches the mask, as shown (note that we’d likely build this constraint using T-SQL rather than by using the CLR, because it accesses a table in the body of the function):

CREATE FUNCTION Music.Publisher$CatalogNumberValidate
(
   @CatalogNumber char(12),
   @PublisherId int --now based on the Artist ID
)
RETURNS bit
AS
BEGIN
   DECLARE @LogicalValue bit, @CatalogNumberMask varchar(100);
   SELECT @LogicalValue = CASE WHEN @CatalogNumber LIKE CatalogNumberMask
                                      THEN 1
                               ELSE 0  END
   FROM   Music.Publisher
   WHERE  PublisherId = @PublisherId;
   RETURN @LogicalValue;
END;

When I loaded the data in the start of this section, I preloaded the data with valid values for the CatalogNumber and CatalogNumberMask columns:

SELECT Album.CatalogNumber, Publisher.CatalogNumberMask
FROM   Music.Album
         JOIN Music.Publisher as Publisher
            ON Album.PublisherId = Publisher.PublisherId;

This returns the following results:

CatalogNumber        CatalogNumberMask
-------------------- --------------------------------------------------------------
433-43ASD-33         [0-9][0-9][0-9]-[0-9][0-9][0-9a-z][0-9a-z][0-9a-z]-[0-9][0-9]
111-11111-11         [0-9][0-9][0-9]-[0-9][0-9][0-9a-z][0-9a-z][0-9a-z]-[0-9][0-9]
CD12345              [a-z][a-z][0-9][0-9][0-9][0-9][0-9]

Now, let’s add the constraint to the table, as shown here:

ALTER TABLE Music.Album
   WITH CHECK ADD CONSTRAINT
       CHKAlbum$CatalogNumber$CatalogNumberValidate
             CHECK (Music.Publisher$CatalogNumberValidate
                          (CatalogNumber,PublisherId) = 1);

If the constraint gives you errors because of invalid data existing in the table (because you were adding data, trying out the table, or in real development, this often occurs with test data from trying out the UI that they are building), you can use a query like the following to find them:

SELECT Album.Name, Album.CatalogNumber, Publisher.CatalogNumberMask
FROM Music.Album
       JOIN Music.Publisher
         ON Publisher.PublisherId = Album.PublisherId
WHERE Music.Publisher$CatalogNumberValidate(Album.CatalogNumber,Album.PublisherId) <> 1;

Now, let’s attempt to add a new row with an invalid value:

INSERT  Music.Album(AlbumId, Name, ArtistId, PublisherId, CatalogNumber)
VALUES  (4,’Who’’s Next’,2,2,’1’);

This causes the following error, because the catalog number of ’1’ doesn’t match the mask set up for PublisherId number 2:

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint "CHKAlbum$CatalogNumber$CatalogNumberValidate". The conflict occurred in database "Chapter7", table "Music.Album".

Now, change the catalog number to something that matches the entry mask the constraint is checking:

INSERT  Music.Album(AlbumId, Name, ArtistId, CatalogNumber, PublisherId)
VALUES  (4,’Who’’s Next’,2,’AC12345’,2);
SELECT * FROM Music.Album;

This returns the following results, which you can see match the ’[a-z][a-z][0-9][0-9][0-9][0-9][0-9]’ mask set up for the publisher with PublisherId = 2:

AlbumId     Name              ArtistId    CatalogNumber        PublisherId
----------- ----------------- ----------- -------------------- -----------
1           The White Album   1           433-43ASD-33         1
2           Revolver          1           111-11111-11         1
3           Quadrophenia      2           CD12345              2
4           Who’s Next        2           AC12345              2

Using this kind of approach, you can build any single-row validation code for your tables. As described previously, each UDF will fire once for each row and each column that was modified in the update. If you are making large numbers of inserts, performance might suffer, but having data that you can trust is worth it.

We will talk about triggers later in this chapter, but alternatively, you could create a trigger that checks for the existence of any rows returned by a query, based on the query used earlier to find improper data in the table:

SELECT *
FROM   Music.Album AS Album
          JOIN Music.Publisher AS Publisher
                ON Publisher.PublisherId = Album.PublisherId
WHERE  Music.Publisher$CatalogNumberValidate
                        (Album.CatalogNumber, Album.PublisherId) <> 1;

There’s one drawback to this type of constraint, whether implemented in a constraint or trigger. As it stands right now, the Album table is protected from invalid values being entered into the CatalogNumber column, but it doesn’t say anything about what happens if a user changes the CatalogEntryMask on the Publisher table. If this is a concern, you’d might consider adding a trigger to the Publisher that validates changes to the mask against any existing data.

Image Caution  Using user-defined functions that access other rows in the same table is dangerous, because while the data for each row appears in the table as the function is executed, if multiple rows are updated simultaneously, those rows do not appear to be in the table, so if an error condition exists only in the rows that are being modified, your final results could end up in error.

Enhancing Errors Caused by Constraints

The real downside to check constraints is the error messages they produce upon failure. The error messages are certainly things you don’t want to show to a user, if for no other reason than they will generate service desk calls every time a typical user sees them. Dealing with these errors is one of the more annoying parts of using constraints in SQL Server.

Whenever a statement fails a constraint requirement, SQL Server provides you with an ugly message and offers no real method for displaying a clean message automatically. In this section, I’ll briefly detail a way to refine the ugly messages you get from a constraint error message, much like the error from the previous statement:

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint "CHKAlbum$CatalogNumber$CatalogNumberValidate". The conflict occurred in database "Chapter7", table "Music.Album".

I’ll show you how to map this to an error message that at least makes some sense. First, the parts of the error message are as follows:

  • Msg 547: The error number that’s passed back to the calling program. In some cases, this error number is significant; however, in most cases it’s enough to say that the error number is nonzero.
  • Level 16: A severity level for the message. 0 through 18 are generally considered to be user messages, with 16 being the default. Levels 1925 are severe errors that cause the connection to be severed (with a message written to the log).
  • State 0: A value from 0127 that represents the state of the process when the error was raised. This value is rarely used by any process.
  • Line 1: The line in the batch or object where the error is occurring. This value can be extremely useful for debugging purposes.
  • Error description: A text explanation of the error that has occurred.

In its raw form, this is the exact error that will be sent to the client. Using TRY-CATCH error handling, we can build a simple error handler and a scheme for mapping constraints to error messages (or you can do much the same thing in client code as well for errors that you just cannot prevent from your user interface). Part of the reason we name constraints is to determine what the intent was in creating the constraint in the first place. In the following code, we’ll implement a very rudimentary error-mapping scheme by parsing the text of the name of the constraint from the message, and then we’ll look up this value in a mapping table. It isn’t a “perfect” scheme, but it does the trick when using constraints as the only data protection for a situation (it is also helps you to document the errors that your system may raise as well).

First, let’s create a mapping table where we put the name of the constraint that we’ve defined and a message that explains what the constraint means:

CREATE SCHEMA ErrorHandling; --used to hold objects for error management purposes
GO
CREATE TABLE ErrorHandling.ErrorMap
(
    ConstraintName sysname NOT NULL CONSTRAINT PKErrorMap PRIMARY KEY,
    Message         varchar(2000) NOT NULL
);
GO
INSERT ErrorHandling.ErrorMap(constraintName, message)
VALUES (’CHKAlbum$CatalogNumber$CatalogNumberValidate’,
        ’The catalog number does not match the format set up by the Publisher’);

Then, we create a procedure to do the actual mapping by taking the values that can be retrieved from the ERROR_%() procedures that are accessible in a CATCH block and using them to look up the value in the ErrorMap table:

CREATE PROCEDURE ErrorHandling.ErrorMap$MapError
(
    @ErrorNumber  int = NULL,
    @ErrorMessage nvarchar(2000) = NULL,
    @ErrorSeverity INT= NULL
) AS
  BEGIN
    SET NOCOUNT ON
    --use values in ERROR_ functions unless the user passes in values
    SET @ErrorNumber = Coalesce(@ErrorNumber, ERROR_NUMBER());
    SET @ErrorMessage = Coalesce(@ErrorMessage, ERROR_MESSAGE());
    SET @ErrorSeverity = Coalesce(@ErrorSeverity, ERROR_SEVERITY());
    --strip the constraint name out of the error message
    DECLARE @constraintName sysname;
    SET @constraintName = substring( @ErrorMessage,
                             CHARINDEX(’constraint "’,@ErrorMessage) + 12,
                             CHARINDEX(’"’,substring(@ErrorMessage,
                             CHARINDEX(’constraint "’,@ErrorMessage) +
                                                                12,2000))-1)
    --store off original message in case no custom message found
    DECLARE @originalMessage nvarchar(2000);
    SET @originalMessage = ERROR_MESSAGE();
    IF @ErrorNumber = 547 --constraint error
      BEGIN
        SET @ErrorMessage =
                        (SELECT message
                         FROM   ErrorHandling.ErrorMap
                         WHERE  constraintName = @constraintName);
      END
    --if the error was not found, get the original message with generic 50000 error number
    SET @ErrorMessage = ISNULL(@ErrorMessage, @originalMessage);
    THROW  50000, @ErrorMessage, @ErrorSeverity;
  END

Now, see what happens when we enter an invalid value for an album catalog number:

BEGIN TRY
     INSERT  Music.Album(AlbumId, Name, ArtistId, CatalogNumber, PublisherId)
     VALUES  (5,’who are you’,2,’badnumber’,2);
END TRY
BEGIN CATCH
    EXEC ErrorHandling.ErrorMap$MapError;
END CATCH

The error message is as follows:

Msg 50000, Level 16, State 1, Procedure ErrorMap$mapError, Line 24
The catalog number does not match the format set up by the Publisher

rather than

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint "CHKAlbum$CatalogNumber$CatalogNumberValidate". The conflict occurred in database "Chapter7", table "Music.Album".

This is far more pleasing, even if it was a bit of a workout getting to this new message.

DML Triggers

A trigger is a type of coded module, similar to a stored procedure, that is attached to a table or view and is executed automatically when an INSERT, UPDATE, or DELETE statement is executed on that object. While triggers share the ability to enforce data protection, they differ from constraints in being far more flexible because you can code them like stored procedures and you can introduce side effects like formatting input data or cascading any operation to another table. You can use triggers to enforce almost any business rule, and they’re especially important for dealing with situations that are too complex for a CHECK constraint to handle. We used triggers in Chapter 6 to automatically manage row update date values.

Triggers often get a bad name because they can be pretty quirky, especially because they can kill performance when you are dealing with large updates. For example, if you have a trigger on a table and try to update a million rows, you are likely to have issues. However, for most operations in a typical OLTP database, operations shouldn’t be touching more than a handful of rows at a time. Trigger usage does need careful consideration, but where they are needed, they are terribly useful. In this chapter, I will demonstrate a few uses of triggers that can’t be done automatically in SQL code any other way:

  • Perform cross-database referential integrity
  • Check inter-row rules, where just looking at the current row isn’t enough for the constraints
  • Check inter-table constraints, when rules require access to data in a different table
  • Introduce desired side effects to your data-modification queries, such as maintaining required denormalizations
  • Guarantee that no insert, update, or delete operations can be executed on a table, even if the user does have rights to perform the operation

Some of these operations could also be done in an application layer, but for the most part, these operations are far easier and safer (particularly for data integrity) when done automatically using triggers. When it comes to data protection, the primary advantages that triggers have over application code is being able to access any data in the database to do the verification without sending it to a client. In Appendix B, I will discuss the mechanics of writing triggers and their various limitations. In this chapter, I am going to create DML triggers to handle typical business needs.

There are two different types of DML triggers that we will make use of in this chapter. Each type can be useful in its own way, but they are quite different in why they are used.

  • AFTER: These triggers fire after the DML statement (INSERT/UPDATE/DELETE) has affected the table. AFTER triggers are usually used for handling rules that won’t fit into the mold of a constraint, for example, rules that require data to be stored, such as a logging mechanism. You may have a virtually unlimited number of AFTER triggers that fire on INSERT, UPDATE, and DELETE, or any combination of them.
  • INSTEAD OF: These triggers operate “instead of” the built-in command (INSERT, UPDATE, or DELETE) affecting the table or view. In this way, you can do whatever you want with the data, either doing exactly what was requested by the user or doing something completely different (you can even just ignore the operation altogether). You can have a maximum of one INSTEAD OF INSERT, UPDATE, and DELETE trigger of each type per table. It is allowed (but not a generally good idea) to combine all three into one and have a single trigger that fires for all three operations.

This section will be split between these two types of triggers because they have two very different sets of use cases. Since coding triggers is not one of the more well-trod topics in SQL Server, in Appendix B, I will introduce trigger coding techniques and provide a template that we will use throughout this chapter (it’s the template we used in Chapter 6, too).

Natively compiled triggers are available in SQL Server 2016, but are limited to AFTER triggers, as well as being subject to the current limitations on natively compiled modules.

AFTER Triggers

AFTER triggers fire after the DML statement has completed. Though triggers may not seem very useful, back in SQL Server 6.0 and earlier, there were no CHECK constraints, and even FOREIGN KEYS were just being introduced, so all data protection was managed using triggers. Other than being quite cumbersome to maintain, some fairly complex systems were created using hardware that is comparable to one of my Logitech Harmony remote controls.

In this section on AFTER triggers, I will present examples that demonstrate several forms of triggers that I use to solve problems that are reasonably common. I’ll give examples of the following types of triggers:

  • Range checks on multiple rows
  • Maintaining summary values
  • Cascading inserts
  • Child-to-parent cascades
  • Relationships that span databases and servers

From these examples, you should be able to extrapolate almost any use of AFTER triggers. Just keep in mind that triggers, although not inherently terrible for performance, should be used no more than necessary.

Image Note  For one additional example, check the section on uniqueness in Chapter 8, where I will implement a type of uniqueness based on ranges of data using a trigger-based solution.

Range Checks on Multiple Rows

The first type of check we’ll look at is the range check, in which we want to make sure that a column is within some specific range of values. You can do range checks using a CHECK constraint to validate the data in a single row (for example, column > 10) quite easily. However, you can’t use them to validate conditions based on aggregate conditions (like SUM(column) > 10) because the CHECK constraint can only access data in the current row (and using a UDF, you can’t see the new data either.)

If you need to check that a row or set of rows doesn’t violate a given condition, usually based on an aggregate like a maximum sum, you use a trigger. As an example, I’ll look at a simple accounting system. As users deposit and withdraw money from accounts, you want to make sure that the balances never dip below zero. All transactions for a given account have to be considered.

First, we create a schema for the accounting objects:

CREATE SCHEMA Accounting;

Then, we create a table for an account and then one to contain the activity for the account:

CREATE TABLE Accounting.Account
(
        AccountNumber        char(10) NOT NULL
                  CONSTRAINT PKAccount PRIMARY KEY
        --would have other columns
);
CREATE TABLE Accounting.AccountActivity
(
        AccountNumber                char(10) NOT NULL
            CONSTRAINT FKAccount$has$Accounting_AccountActivity
                       FOREIGN KEY REFERENCES Accounting.Account(AccountNumber),
       --this might be a value that each ATM/Teller generates
        TransactionNumber            char(20) NOT NULL,
        Date                         datetime2(3) NOT NULL,
        TransactionAmount            numeric(12,2) NOT NULL,
        CONSTRAINT PKAccountActivity
                      PRIMARY KEY (AccountNumber, TransactionNumber)
);

Now, we add a trigger to the Accounting.AccountActivity table that checks to make sure that when you sum together the transaction amounts for an Account, that the sum is greater than zero:

CREATE TRIGGER Accounting.AccountActivity$insertTrigger
ON Accounting.AccountActivity
AFTER INSERT AS
BEGIN
   SET NOCOUNT ON;
   SET ROWCOUNT 0; --in case the client has modified the rowcount
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   DECLARE @msg varchar(2000),    --used to hold the error message
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
           @rowsAffected int = (SELECT COUNT(*) FROM inserted);
   --      @rowsAffected int = (SELECT COUNT(*) FROM deleted);
   --no need to continue on if no rows affected
   IF @rowsAffected = 0 RETURN;
   BEGIN TRY
   --[validation section]
   --disallow Transactions that would put balance into negatives
   IF EXISTS ( SELECT AccountNumber
               FROM Accounting.AccountActivity AS AccountActivity
               WHERE EXISTS (SELECT *
                             FROM   inserted
                             WHERE  inserted.AccountNumber =
                               AccountActivity.AccountNumber)
                   GROUP BY AccountNumber
                   HAVING SUM(TransactionAmount) < 0)
      BEGIN
         IF @rowsAffected = 1
             SELECT @msg = CONCAT(’Account: ’, AccountNumber,
                  ’ TransactionNumber:’,TransactionNumber, ’ for amount: ’, TransactionAmount,
                  ’ cannot be processed as it will cause a negative balance’)
             FROM   inserted;
        ELSE
          SELECT @msg = ’One of the rows caused a negative balance’;
          THROW  50000, @msg, 16;
      END
   --[modification section]
   END TRY
   BEGIN CATCH
              IF @@trancount > 0
                  ROLLBACK TRANSACTION;
              THROW; --will halt the batch or be caught by the caller’s catch block
     END CATCH
END;

The key to this type of trigger is to look for the existence of rows in the base table, not the rows in the inserted table, because the concern is how the inserted rows affect the overall status for an Account. The base of the query is a query one might run to check the status of the table without a trigger. Is there an account where the sum of their transactions is less than 0?

SELECT AccountNumber
FROM Accounting.AccountActivity AS AccountActivity
GROUP BY AccountNumber
HAVING SUM(TransactionAmount) < 0;

Then we include a correlation to the rows that have been created since this is an insert trigger:

SELECT AccountNumber
FROM Accounting.AccountActivity AS AccountActivity
WHERE EXISTS (SELECT *
              FROM   inserted
              WHERE  inserted.AccountNumber = AccountActivity.AccountNumber)
GROUP BY AccountNumber
HAVING SUM(TransactionAmount) < 0;

The WHERE clause simply makes sure that the only rows we consider are for accounts that have new data inserted. This way, we don’t end up checking all rows that we know our query hasn’t touched. Note, too, that I don’t use a JOIN operation. By using an EXISTS criteria in the WHERE clause, we don’t affect the cardinality of the set being returned in the FROM clause, no matter how many rows in the inserted table have the same AccountNumber. Now, as you can see up in the trigger, this is placed into the trigger using an IF EXISTS statement, which is then followed by the error handing stuff:

   IF EXISTS ( SELECT AccountNumber
               FROM Accounting.AccountActivity AS AccountActivity
               WHERE EXISTS (SELECT *
                             FROM   inserted
                             WHERE  inserted.AccountNumber =
                               AccountActivity.AccountNumber)
                   GROUP BY AccountNumber
                   HAVING SUM(TransactionAmount) < 0)
      BEGIN             --error handling stuff

To see it in action, use this code:

--create some set up test data
INSERT INTO Accounting.Account(AccountNumber)
VALUES (’1111111111’);
INSERT INTO Accounting.AccountActivity(AccountNumber, TransactionNumber,
                                         Date, TransactionAmount)
VALUES (’1111111111’,’A0000000000000000001’,’20050712’,100),
       (’1111111111’,’A0000000000000000002’,’20050713’,100);

Now, let’s see what happens when we violate this rule:

INSERT  INTO Accounting.AccountActivity(AccountNumber, TransactionNumber,
                                         Date, TransactionAmount)
VALUES (’1111111111’,’A0000000000000000003’,’20050713’,-300);

Here’s the result:

Msg 50000, Level 16, State 16, Procedure AccountActivity$insertTrigger, Line 40
Account: 1111111111 TransactionNumber:A0000000000000000003 for amount: -300.00 cannot be processed as it will cause a negative balance

The error message is the custom error message that we coded in the case where a single row was modified. Now, let’s make sure that the trigger works when we have greater than one row in the INSERT statement:

--create new Account
INSERT  INTO Accounting.Account(AccountNumber)
VALUES (’2222222222’);
GO
--Now, this data will violate the constraint for the new Account:
INSERT  INTO Accounting.AccountActivity(AccountNumber, TransactionNumber,
                                        Date, TransactionAmount)
VALUES (’1111111111’,’A0000000000000000004’,’20050714’,100),
       (’2222222222’,’A0000000000000000005’,’20050715’,100),
       (’2222222222’,’A0000000000000000006’,’20050715’,100),
       (’2222222222’,’A0000000000000000007’,’20050715’,-201);

This causes the following error:

Msg 50000, Level 16, State 16, Procedure AccountActivity$insertUpdateTrigger, Line 40
One of the rows caused a negative balance

The multirow error message is much less informative, though you could expand it to include information about a row (or all the rows) that caused the violation with some more text, even showing the multiple failed values with a bit of work if that was an issue. Usually a simple message is sufficient to deal with, because generally if multiple rows are being modified in a single statement, it’s a batch process, and the complexity of building error messages is way more than its worth. Processes would likely be established on how to deal with certain errors being returned.

Image Tip  Error handling will be covered in more detail in the “Dealing with Triggers and Constraint Errors” section later in this chapter. I have used a very simple error model for these triggers that simply rethrows the error that occurs in the trigger after rolling back the transaction.

If this is a real accounting-oriented table, then an INSERT trigger may be enough, as a true accounting system handles modifications with offsetting entries (delete a $100 charge with a -$100 charge.) However, if you did need to process deletes, you would change the base query to

SELECT AccountNumber
FROM Accounting.AccountActivity AS AccountActivity
WHERE EXISTS (SELECT *
              FROM   deleted
              WHERE  deleted.AccountNumber = AccountActivity.AccountNumber)
GROUP BY AccountNumber
HAVING SUM(TransactionAmount) < 0;

For the DELETE trigger, and for the UPDATE trigger, we need to check both tables (in case someone updates the AccountNumber, if that is allowed):

SELECT AccountNumber
FROM Accounting.AccountActivity AS AccountActivity
WHERE EXISTS (SELECT *
              FROM   (SELECT AccountNumber
                      FROM   deleted
                      UNION ALL
                      SELECT AccountNumber
                      FROM   inserted) AS CheckThese
              WHERE  CheckThese.AccountNumber = AccountActivity.AccountNumber)
GROUP BY AccountNumber
HAVING SUM(TransactionAmount) < 0;

Changeable keys such as an account number that can be modified on a transaction trips up a lot of people when implementing data checking that spans multiple rows.

Viewing Trigger Events

To see the events for which a trigger fires, you can use the following query:

SELECT trigger_events.type_desc
FROM sys.trigger_events
         JOIN sys.triggers
                  ON sys.triggers.object_id = sys.trigger_events.object_id
WHERE  triggers.name = ’AccountActivity$insertTrigger’;
Maintaining Summary Values

While in the first five chapters of the book I preached strenuously against maintaining summary values, there are cases where some form of active summarization may be necessary. For example:

  • There is no other reasonable method available to optimize a process.
  • The amount of reads of the summary values is far greater than the activity on the lower values and the number of times the data is modified and therefore changed.

As an example, let’s extend the previous example of the Account and AccountActivity tables from the “Range Checks on Multiple Rows” section. To the Account table, I will add a BalanceAmount column:

ALTER TABLE Accounting.Account
   ADD BalanceAmount numeric(12,2) NOT NULL
      CONSTRAINT DFLTAccount_BalanceAmount DEFAULT (0.00);

Then, I will update the Balance column to have the current value of the data in the AccountActivity rows. First, run this query to view the expected values:

SELECT  Account.AccountNumber,
        SUM(COALESCE(AccountActivity.TransactionAmount,0.00)) AS NewBalance
FROM   Accounting.Account
         LEFT OUTER JOIN Accounting.AccountActivity
            ON Account.AccountNumber = AccountActivity.AccountNumber
GROUP  BY Account.AccountNumber;

This returns the following:

AccountNumber NewBalance
------------- ---------------------------------------
1111111111    200.00
2222222222    0.00

Now, update the BalanceAmount column values to the existing rows using the following statement:

WITH  UpdateCTE AS (
SELECT  Account.AccountNumber,
        SUM(coalesce(TransactionAmount,0.00)) AS NewBalance
FROM   Accounting.Account
        LEFT OUTER JOIN Accounting.AccountActivity
            On Account.AccountNumber = AccountActivity.AccountNumber
GROUP  BY Account.AccountNumber)
UPDATE Account
SET    BalanceAmount = UpdateCTE.NewBalance
FROM   Accounting.Account
         JOIN UpdateCTE
                ON Account.AccountNumber = UpdateCTE.AccountNumber;

That statement will make the basis of our changes to the trigger that we added in the previous section (the changes appear in bold). The only change that needs to be made is to filter the Account set down to the accounts that were affected by the DML that caused the trigger to fire as we did in the previous section using an EXISTS filter to let you not have to worry about whether one new row was created for the account or 100.

ALTER TRIGGER Accounting.AccountActivity$insertTrigger
ON Accounting.AccountActivity
AFTER INSERT AS
BEGIN
   SET NOCOUNT ON;
   SET ROWCOUNT 0; --in case the client has modified the rowcount
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   DECLARE @msg varchar(2000),    --used to hold the error message
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
           @rowsAffected int = (SELECT COUNT(*) FROM inserted);
   --      @rowsAffected int = (SELECT COUNT(*) FROM deleted);
   BEGIN TRY
   --[validation section]
   --disallow Transactions that would put balance into negatives
   IF EXISTS ( SELECT AccountNumber
               FROM Accounting.AccountActivity as AccountActivity
               WHERE EXISTS (SELECT *
                             FROM   inserted
                             WHERE  inserted.AccountNumber =
                               AccountActivity.AccountNumber)
                   GROUP BY AccountNumber
                   HAVING SUM(TransactionAmount) < 0)
      BEGIN
         IF @rowsAffected = 1
             SELECT @msg = ’Account: ’ + AccountNumber +
                  ’ TransactionNumber:’ +
                   cast(TransactionNumber as varchar(36)) +
                   ’ for amount: ’ + cast(TransactionAmount as varchar(10))+
                   ’ cannot be processed as it will cause a negative balance’
             FROM   inserted;
        ELSE
          SELECT @msg = ’One of the rows caused a negative balance’;
          THROW  50000, @msg, 16;
      END;
    --[modification section]
    IF UPDATE (TransactionAmount)
      BEGIN
        ;WITH  Updater as (
        SELECT  Account.AccountNumber,
                SUM(coalesce(TransactionAmount,0.00)) AS NewBalance
        FROM   Accounting.Account
                LEFT OUTER JOIN Accounting.AccountActivity
                    On Account.AccountNumber = AccountActivity.AccountNumber
               --This where clause limits the summarizations to those rows
               --that were modified by the DML statement that caused
               --this trigger to fire.
        WHERE  EXISTS (SELECT *
                       FROM   Inserted
                       WHERE  Account.AccountNumber = Inserted.AccountNumber)
        GROUP  BY Account.AccountNumber)
        UPDATE Account
        SET    BalanceAmount = Updater.NewBalance
        FROM   Accounting.Account
                  JOIN Updater
                      ON Account.AccountNumber = Updater.AccountNumber;
      END;
   END TRY
   BEGIN CATCH
              IF @@trancount > 0
                  ROLLBACK TRANSACTION;
              THROW; --will halt the batch or be caught by the caller’s catch block
   END CATCH;
END;

Now, insert a new row into AccountActivity:

INSERT  INTO Accounting.AccountActivity(AccountNumber, TransactionNumber,
                                        Date, TransactionAmount)
VALUES (’1111111111’,’A0000000000000000004’,’20050714’,100);

Next, examine the state of the Account table, comparing it to the query used previously to check what the balances should be:

SELECT  Account.AccountNumber,Account.BalanceAmount,
        SUM(coalesce(AccountActivity.TransactionAmount,0.00)) AS SummedBalance
FROM   Accounting.Account
        LEFT OUTER JOIN Accounting.AccountActivity
            ON Account.AccountNumber = AccountActivity.AccountNumber
GROUP  BY Account.AccountNumber,Account.BalanceAmount;

This returns the following, showing that the sum is the same as the stored balance:

AccountNumber BalanceAmount    SummedBalance
------------- ---------------- -------------------
1111111111    300.00           300.00
2222222222    0.00             0.00

The next step—the multirow test—is very important when building a trigger such as this. You need to be sure that if a user inserts more than one row at a time, it will work. In our example, we will insert rows for both accounts in the same DML statement and two rows for one of the accounts. This is not a sufficient test necessarily, but it’s enough for demonstration purposes at least:

INSERT  into Accounting.AccountActivity(AccountNumber, TransactionNumber,
                                        Date, TransactionAmount)
VALUES (’1111111111’,’A0000000000000000005’,’20050714’,100),
       (’2222222222’,’A0000000000000000006’,’20050715’,100),
       (’2222222222’,’A0000000000000000007’,’20050715’,100);

Again, the query on the AccountActivity and Account tables should show the same balances:

AccountNumber BalanceAmount    SummedBalance
------------- ---------------- -------------------
1111111111    400.00           400.00
2222222222    200.00           200.00

If you wanted an UPDATE or DELETE trigger, you would simply apply the same logic in the EXISTS expression as in the previous section.

I can’t stress enough that this type of summary data strategy should be the rare exception, not the rule. But when you have to implement summary data, using a trigger to make sure it happens is a great strategy. One of the more frustrating problems that one has to deal with when using data is summary data that doesn’t match the data that is the supposed source, because it takes time away from making progress with creating new software.

Image Caution  When introducing triggers, make sure you test single-row and multirow operations, all variations of INSERTs, UPDATEs, and DELETEs, and what happens if you modify the primary key, or key you are using to group data on. In Appendix B, I will also cover some settings that determine the behavior if you have multiple triggers on the same operation that must also be tested.

Cascading Inserts

A cascading insert refers to the situation whereby after a row is inserted into a table, one or more other new rows are automatically inserted into other tables. This is frequently done when you need to initialize a row in another table, quite often a status of some sort.

For this example, we’re going to build a small system to store URLs for a website-linking system. During low-usage periods, an automated browser connects to the URLs so that they can be verified (hopefully, limiting broken links on web pages).

To implement this, I’ll use the set of tables in Figure 7-3.

9781484219720_7_Fig3.jpg

Figure 7-3. Storing URLs for a website-linking system

CREATE SCHEMA Internet;
GO
CREATE TABLE Internet.Url
(
    UrlId int NOT NULL IDENTITY(1,1) CONSTRAINT PKUrl primary key,
    Name  varchar(60) NOT NULL CONSTRAINT AKUrl_Name UNIQUE,
    Url   varchar(200) NOT NULL CONSTRAINT AKUrl_Url UNIQUE
);
--Not a user manageable table, so not using identity key (as discussed in
--Chapter 6 when I discussed choosing keys) in this one table.  Others are
--using identity-based keys in this example.
CREATE TABLE Internet.UrlStatusType
(
        UrlStatusTypeId  int NOT NULL
                      CONSTRAINT PKUrlStatusType PRIMARY KEY,
        Name varchar(20) NOT NULL
                      CONSTRAINT AKUrlStatusType UNIQUE,
        DefaultFlag bit NOT NULL,
        DisplayOnSiteFlag bit NOT NULL
);
CREATE TABLE Internet.UrlStatus
(
        UrlStatusId int NOT NULL IDENTITY(1,1)
                      CONSTRAINT PKUrlStatus PRIMARY KEY,
        UrlStatusTypeId int NOT NULL
                      CONSTRAINT
               FKUrlStatusType$defines_status_type_of$Internet_UrlStatus
                      REFERENCES Internet.UrlStatusType(UrlStatusTypeId),
        UrlId int NOT NULL
          CONSTRAINT FKUrl$has_status_history_in$Internet_UrlStatus
                      REFERENCES Internet.Url(UrlId),
        ActiveTime        datetime2(3),
        CONSTRAINT AKUrlStatus_statusUrlDate
                      UNIQUE (UrlStatusTypeId, UrlId, ActiveTime)
);
--set up status types
INSERT  Internet.UrlStatusType (UrlStatusTypeId, Name,
                                   DefaultFlag, DisplayOnSiteFlag)
VALUES (1, ’Unverified’,1,0),
       (2, ’Verified’,0,1),
       (3, ’Unable to locate’,0,0);

The Url table holds URLs to different sites on the Web. When someone enters a URL, we initialize the status to ’Unverified’. A process should be in place in which the site is checked often to make sure nothing has changed (particularly the unverified ones!).

You begin by building a trigger that inserts a row into the UrlStatus table on an insert that creates a new row with the UrlId and the default UrlStatusType based on DefaultFlag having the value of 1:

CREATE TRIGGER Internet.Url$insertTrigger
ON Internet.Url
AFTER INSERT AS
BEGIN
   SET NOCOUNT ON;
   SET ROWCOUNT 0; --in case the client has modified the rowcount
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   DECLARE @msg varchar(2000),    --used to hold the error message
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
           @rowsAffected int = (SELECT COUNT(*) FROM inserted);
   --           @rowsAffected int = (SELECT COUNT(*) FROM deleted);
   BEGIN TRY
          --[validation section]
          --[modification section]
          --add a row to the UrlStatus table to tell it that the new row
          --should start out as the default status
          INSERT INTO Internet.UrlStatus (UrlId, UrlStatusTypeId, ActiveTime)
          SELECT inserted.UrlId, UrlStatusType.UrlStatusTypeId,
                  SYSDATETIME()
          FROM inserted
                CROSS JOIN (SELECT UrlStatusTypeId
                            FROM   UrlStatusType
                            WHERE  DefaultFlag = 1)  as UrlStatusType;
                                           --use cross join to apply this one row to
                                           --rows in inserted
   END TRY
   BEGIN CATCH
              IF @@TRANCOUNT > 0
                  ROLLBACK TRANSACTION;
              THROW; --will halt the batch or be caught by the caller’s catch block
     END CATCH;
END;

The idea here is that for every row in the inserted table, we’ll get the single row from the UrlStatusType table that has DefaultFlag equal to 1. So, let’s try it:

INSERT  Internet.Url(Name, Url)
VALUES (’Author’’s Website’,
        ’http://drsql.org’);
SELECT Url.Url,Url.Name,UrlStatusType.Name as Status, UrlStatus.ActiveTime
FROM   Internet.Url
          JOIN Internet.UrlStatus
             ON Url.UrlId = UrlStatus.UrlId
          JOIN Internet.UrlStatusType
             ON UrlStatusType.UrlStatusTypeId = UrlStatus.UrlStatusTypeId;

This returns the following results:

Url                 Name                  Status               ActiveTime
------------------- --------------------- -------------------- ---------------------------
http://drsql.org    Author’s Website      Unverified           2016-04-11 20:43:52.954

Image Tip  It’s easier if users can’t modify the data in tables such as the UrlStatusType table, so there cannot be a case where there’s no status set as the default (or too many rows). If there were no default status, the URL would never get used, because the processes wouldn’t see it. One of the later examples in this chapter will be to force no action to be performed for a DML operation.

Cascading from Child to Parent

All the cascade operations on updates that you can do with constraints (CASCADE or SET NULL) are strictly from parent to child. Sometimes, you want to go the other way around and delete the parents of a row when you delete the child. Typically, you do this when the child is what you’re interested in and the parent is simply maintained as an attribute of the child that is only desired when one or more child rows exist. Also typical of this type of situation is that you want to delete the parent only if all children are deleted.

In our example, we have a small model of my console game collection. I have several game systems and quite a few games. Often, I have the same game on multiple platforms, so I want to track this fact, especially if I want to trade a game that I have on multiple platforms for something else. So, I have a table for the GamePlatform (the system) and another for the actual Game itself. This is a many-to-many relationship, so I have an associative entity called GameInstance to record ownership, as well as when the game was purchased for the given platform. Each of these tables has a delete-cascade relationship, so all instances are removed. What about the games, though? If all GameInstance rows are removed for a given game, I want to delete the game from the database. The tables are shown in Figure 7-4.

9781484219720_7_Fig4.jpg

Figure 7-4. The game tables

--start a schema for entertainment-related tables
CREATE SCHEMA Entertainment;
GO
CREATE TABLE Entertainment.GamePlatform
(
    GamePlatformId int NOT NULL CONSTRAINT PKGamePlatform PRIMARY KEY,
    Name  varchar(50) NOT NULL CONSTRAINT AKGamePlatform_Name UNIQUE
);
CREATE TABLE Entertainment.Game
(
    GameId  int NOT NULL CONSTRAINT PKGame PRIMARY KEY,
    Name    varchar(50) NOT NULL CONSTRAINT AKGame_Name UNIQUE
    --more details that are common to all platforms
);
--associative entity with cascade relationships back to Game and GamePlatform
CREATE TABLE Entertainment.GameInstance
(
    GamePlatformId int NOT NULL,
    GameId int NOT NULL,
    PurchaseDate date NOT NULL,
    CONSTRAINT PKGameInstance PRIMARY KEY (GamePlatformId, GameId),
    CONSTRAINT FKGame$is_owned_on_platform_by$EntertainmentGameInstance
                  FOREIGN KEY (GameId)
                           REFERENCES Entertainment.Game(GameId) ON DELETE CASCADE,
      CONSTRAINT FKGamePlatform$is_linked_to$EntertainmentGameInstance
                  FOREIGN KEY (GamePlatformId)
                           REFERENCES Entertainment.GamePlatform(GamePlatformId)
                ON DELETE CASCADE
);

Then, I insert a sampling of data:

INSERT  Entertainment.Game (GameId, Name)
VALUES (1,’Disney Infinity’),
       (2,’Super Mario Bros’);
INSERT  Entertainment.GamePlatform(GamePlatformId, Name)
VALUES (1,’Nintendo WiiU’),   --Yes, as a matter of fact I am still a
       (2,’Nintendo 3DS’);     --Nintendo Fanboy, why do you ask?
INSERT  Entertainment.GameInstance(GamePlatformId, GameId, PurchaseDate)
VALUES (1,1,’20140804’),
       (1,2,’20140810’),
       (2,2,’20150604’);
--the full outer joins ensure that all rows are returned from all sets, leaving
--nulls where data is missing
SELECT  GamePlatform.Name as Platform, Game.Name as Game, GameInstance. PurchaseDate
FROM    Entertainment.Game as Game
            FULL OUTER JOIN Entertainment.GameInstance as GameInstance
                    ON Game.GameId = GameInstance.GameId
            FULL OUTER JOIN Entertainment.GamePlatform
                    ON GamePlatform.GamePlatformId = GameInstance.GamePlatformId;

As you can see, I have two games for WiiU and only a single one for Nintendo 3DS:

Platform             Game                 PurchaseDate
-------------------- -------------------- ------------
Nintendo WiiU        Disney Infinity      2014-08-04
Nintendo WiiU        Super Mario Bros     2014-08-10
Nintendo 3DS         Super Mario Bros     2015-06-04

So, I create a trigger on the table to do the “reverse” cascade operation:

CREATE TRIGGER Entertainment.GameInstance$deleteTrigger
ON Entertainment.GameInstance
AFTER DELETE AS
BEGIN
   SET NOCOUNT ON;
   SET ROWCOUNT 0; --in case the client has modified the rowcount
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   DECLARE @msg varchar(2000),    --used to hold the error message
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   --        @rowsAffected int = (SELECT COUNT(*) FROM inserted);
             @rowsAffected int = (SELECT COUNT(*) FROM deleted);
   BEGIN TRY
        --[validation section]
        --[modification section]
                         --delete all Games
        DELETE Game      --where the GameInstance was deleted
        WHERE  GameId IN (SELECT deleted.GameId
                          FROM   deleted     --and there are no GameInstances left
                          WHERE  NOT EXISTS (SELECT  *     
                                              FROM    GameInstance
                                              WHERE   GameInstance.GameId =
                                                               deleted.GameId));
   END TRY
   BEGIN CATCH
              IF @@TRANCOUNT > 0
                  ROLLBACK TRANSACTION;
              THROW; --will halt the batch or be caught by the caller’s catch block
   END CATCH;
END;

It’s as straightforward as that. Just delete the games, and let the trigger cover the rest. Delete the row for the Wii:

DELETE  Entertainment.GameInstance
WHERE   GamePlatformId = 1;

Next, check the data:

SELECT  GamePlatform.Name AS Platform, Game.Name AS Game, GameInstance. PurchaseDate
FROM    Entertainment.Game AS Game
            FULL OUTER JOIN Entertainment.GameInstance as GameInstance
                    ON Game.GameId = GameInstance.GameId
            FULL OUTER JOIN Entertainment.GamePlatform
                    ON GamePlatform.GamePlatformId = GameInstance.GamePlatformId;

You can see that now I have only a single row in the Game table:

platform             Game                 PurchaseDate
-------------------- -------------------- ------------
Nintendo 3DS         Super Mario Bros     2015-06-04
Nintendo WiiU        NULL                 NULL

This shows us that the Game row was deleted when all instances were removed, but the platform remains. (The technique of using FULL OUTER JOINs like this will help you to be able to see permutations of matched rows. Reinsert the Game row for ’Disney Infinity’ and you will see another row show up that has NULL for Platform and PurchaseDate.)

Relationships That Span Databases

Prior to constraints, all relationships were enforced by triggers. Thankfully, when it comes to relationships, triggers are now relegated to enforcing special cases of relationships, such as when you have relationships between tables that are on different databases. It is more of an academic exercise at this point, as for most readers this will not occur. However, it is a good exercise, forcing you to think about all of the different changes that can go into using triggers to cover circumstances from multiple tables.

To implement a relationship using triggers, you need several triggers:

  • Parent:
    • UPDATE: Disallow the changing of keys if child values exist, or cascade the update.
    • DELETE: Prevent or cascade the deletion of rows that have associated parent rows.
  • Child:
    • INSERT: Check to make sure the key exists in the parent table.
    • UPDATE: Check to make sure the “possibly” changed key exists in the parent table.

To begin this section, I will present templates to use to build these triggers, and then in the final section, I will code a complete trigger for demonstration. For these snippets of code, I refer to the tables as parent and child, with no schema or database named. Replacing the bits that are inside these greater-than and less-than symbols with appropriate code and table names that include the database and schema gives you the desired result when plugged into the trigger templates we’ve been using throughout this chapter.

Parent Update

Note that you can omit the parent update step if using surrogate keys based on identity property columns, because they aren’t editable and hence cannot be changed.

There are a few possibilities you might want to implement:

  • Cascading operations to child rows
  • Preventing updating parent if child rows exist

Cascading operations is not possible from a proper generic trigger-coding standpoint. The problem is that if you modify the key of one or more parent rows in a statement that fires the trigger, there is not necessarily any way to correlate rows in the inserted table with the rows in the deleted table, leaving you unable to know which row in the inserted table is supposed to match which row in the deleted table. So, I would not implement the cascading of a parent key change in a trigger; I would do this in your external code if you find the need for editable keys that cascade where a FOREIGN KEY constraint is not allowed (which should be pretty uncommon).

Preventing an update of parent rows where child rows exist is very straightforward. The idea here is that you want to take the same restrictive action as the NO ACTION clause on a relationship, for example:

IF UPDATE(<parent_key_columns>)
   BEGIN
          IF EXISTS ( SELECT  *
                      FROM    deleted
                                 JOIN <child>
                                    ON <child>.<parent_keys> =
                                                   deleted.<parent_keys>
                    )
          BEGIN
             IF @rowsAffected = 1
                    SELECT @msg = ’one row message’ + inserted.somedata
                    FROM   inserted;
                 ELSE
                    SELECT @msg = ’multi-row message’;
             THROW 50000, @msg, 16;
          END;
   END;

Parent Delete

Like the update, when a parent table row is deleted, we can either

  • Cascade the delete to child rows
  • Prevent deleting parent rows if child rows exist

Cascading is very simple. For the delete, you simply use a correlated EXISTS subquery to get matching rows in the child table to the parent table:

DELETE <child>
WHERE  EXISTS ( SELECT *
                FROM    <parent>
                WHERE <child>.<parent_key> = <parent>.<parent_key>);

To prevent the delete from happening when a child row exists, here’s the basis of code to prevent deleting rows that have associated parent rows:

IF EXISTS  ( SELECT   *
             FROM     deleted
                          JOIN <child>
                              ON <child>.<parent_key> = deleted.<parent_key>
             )
    BEGIN
         IF @rowsAffected = 1
             SELECT @msg = ’one row message’ + inserted.somedata
             FROM   inserted;
         ELSE
             SELECT @msg = ’multi-row message’;
         THROW 50000, @msg, 16;
     END;
   END;

Child Insert and Child Update

On the child table, the goal will basically be to make sure that for every value you create in the child table, there exists a corresponding row in the parent table. The following snippet does this and takes into consideration the case where null values are allowed as well:

  --@numrows is part of the standard template
  DECLARE @nullcount int,
          @validcount int;
  IF UPDATE(<parent_key>)
   BEGIN
      --you can omit this check if nulls are not allowed
      SELECT  @nullcount = count(*)
      FROM    inserted
      WHERE   inserted.<parent_key> is null;
      --does not count null values
      SELECT  @validcount = count(*)
      FROM    inserted
                 JOIN <parent> as Parent
                        ON  inserted.<parent_keys> = Parent.<parent_keys>;
      if @validcount + @nullcount != @numrows
        BEGIN
            IF @rowsAffected = 1
               SELECT @msg = ’The inserted <parent_key_name>: ’
                                + CAST(parent_key as varchar(10))
                                + ’ is not valid in the parent table.’
                FROM   inserted;
            ELSE
               SELECT @msg = ’Invalid <parent key column name> in the inserted rows.’
               THROW 50000, @msg, 16;
         END
    END

Using basic blocks of code such as these, you can validate most any foreign key relationship using triggers. For example, say you have a table in your PhoneData database called Logs.Call, with a primary key of CallId. In the CRM database, you have a Contacts.Journal table that stores contacts made to a person. To implement the child insert trigger, just fill in the blanks. The update trigger will be identical in the code as well and could be combined if only one trigger will ever be needed. This code will not execute as is; this is just for illustration purposes.)

CREATE TRIGGER Contacts.Journal$insertTrigger
ON Contacts.Journal
AFTER INSERT AS
BEGIN
   SET NOCOUNT ON;
   SET ROWCOUNT 0; --in case the client has modified the rowcount
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   DECLARE @msg varchar(2000),    --used to hold the error message
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
           @rowsAffected int = (SELECT COUNT(*) FROM inserted);
   --           @rowsAffected int = (SELECT COUNT(*) FROM deleted);
   BEGIN TRY
      --[validation section]
      --@numrows is part of the standard template
      DECLARE @nullcount int,
              @validcount int;
      IF UPDATE(CallId)
       BEGIN
          --omit this check if nulls are not allowed
          --(left in here for an example)
          SELECT  @nullcount = COUNT(*)
          FROM    inserted
          WHERE   inserted.CallId IS NULL;
          --does not include null values
          SELECT  @validcount = count(*)
          FROM    inserted
                     JOIN PhoneData.Logs.Call AS Parent
                            on  inserted.CallId = Parent.CallId;
          IF @validcount + @nullcount <> @rowsAffected
            BEGIN
                IF @rowsAffected = 1
                   SELECT @msg = ’The inserted CallId: ’
                                    + cast(CallId AS varchar(10))
                                    + ’ is not valid in the’
                                    + ’ PhoneData.Logs.Call table.’
                    FROM   inserted;
                ELSE
                    SELECT @msg = ’Invalid CallId in the inserted rows.’;
                THROW  50000, @Msg, 1;
             END
        END
        --[modification section]
   END TRY
   BEGIN CATCH
              IF @@trancount > 0
                  ROLLBACK TRANSACTION
              THROW; --will halt the batch or be caught by the caller’s catch block
   END CATCH;
END;

INSTEAD OF Triggers

As explained in the introduction to the “DML Triggers” section, INSTEAD OF triggers fire before to the DML action being affected by the SQL engine, rather than after it for AFTER triggers. In fact, when you have an INSTEAD OF trigger on a table, it’s the first thing that’s done when you INSERT, UPDATE, or DELETE from a table. These triggers are named INSTEAD OF because they fire instead of the native action the user executed. Inside the trigger, you perform the action—either the action that the user performed or some other action. One thing that makes these triggers useful is that you can use them on views to make what would otherwise be noneditable views editable. Doing this, you encapsulate calls to all the affected tables in the trigger, much like you would a stored procedure, except now this view has all the external properties of a physical table, hiding the actual implementation from users.

Probably the most obvious limitation of INSTEAD OF triggers is that you can have only one for each action (INSERT, UPDATE, and DELETE) on the table. It is also possible to combine triggered actions just like you can for AFTER triggers, like having one INSTEAD OF trigger for INSERT and UPDATE (something I even more strongly suggest against for almost all uses of INSTEAD OF triggers). We’ll use a slightly modified version of the same trigger template that we used for the T-SQL AFTER triggers, covered in more detail in Appendix B.

I most often use INSTEAD OF triggers to set or modify values in my statements automatically so that the values are set to what I want, no matter what the client sends in a statement, much like we did in Chapter 6 with the rowLastModifiedTime and rowCreatedTime columns. If you record last update times through client calls, it can be problematic if one of the client’s clocks is a minute, a day, or even a year off. (You see this all the time in applications. My favorite example is a system where phone calls appeared to be taking negative amounts of time because the client was reporting when something started and the server was recording when it stopped.) You can extend the paradigm of setting a value to formatting any data, such as if you wanted to make all data that was stored lowercase.

I’ll demonstrate two ways you can use INSTEAD OF triggers:

  • Redirecting invalid data to an exception table
  • Forcing no action to be performed on a table, even by someone who technically has proper rights

It’s generally a best practice not to use INSTEAD OF triggers to do error raising validations such as we did with AFTER triggers. Typically, an INSTEAD OF trigger is employed to make things happen in the background in a silent manner.

Redirecting Invalid Data to an Exception Table

On some occasions, instead of returning an error when an invalid value is set for a column, you simply want to ignore it and log that an error had occurred. Generally, this wouldn’t be used for bulk loading data (using SSIS’s facilities to do this is a much better idea), but some examples of why you might do this follow:

  • Heads-down key entry: In many shops where customer feedback forms or payments are received by the hundreds or thousands, there are people who open the mail, read it, and key in what’s on the page. These people become incredibly skilled in rapid entry and generally make few mistakes. The mistakes they do make don’t raise an error on their screens; rather, they fall to other people—exception handlers—to fix. You could use an INSTEAD OF trigger to redirect the wrong data to an exception table to be handled later.
  • Values that are read in from devices: An example of this is on an assembly line, where a reading is taken but is so far out of range it couldn’t be true, because of the malfunction of a device or just a human moving a sensor. Too many exception rows would require a look at the equipment, but only a few might be normal and acceptable. Another possibility is when someone scans a printed page using a scanner and inserts the data. Often, the values read are not right and have to be checked manually.

For our example, I’ll design a table to take weather readings from a single thermometer. Sometimes, this thermometer sends back bad values that are impossible. We need to be able to put in readings, sometimes many at a time, because the device can cache results for some time if there is signal loss, but it tosses off the unlikely rows.

We build the following table, initially using a constraint to implement the simple sanity check. In the analysis of the data, we might find anomalies, but in this process, all we’re going to do is look for the “impossible” cases:

CREATE SCHEMA Measurements;
GO
CREATE TABLE Measurements.WeatherReading
(
    WeatherReadingId int NOT NULL IDENTITY
          CONSTRAINT PKWeatherReading PRIMARY KEY,
    ReadingTime   datetime2(3) NOT NULL
          CONSTRAINT AKWeatherReading_Date UNIQUE,
    Temperature     float NOT NULL
          CONSTRAINT CHKWeatherReading_Temperature
                      CHECK(Temperature BETWEEN -80 and 150)
                      --raised from last edition for global warming
);

Then, we go to load the data, simulating what we might do when importing the data all at once:

INSERT  INTO Measurements.WeatherReading (ReadingTime, Temperature)
VALUES (’20160101 0:00’,82.00), (’20160101 0:01’,89.22),
       (’20160101 0:02’,600.32),(’20160101 0:03’,88.22),
       (’20160101 0:04’,99.01);

As we know with CHECK constraints, this isn’t going to fly:

Msg 547, Level 16, State 0, Line 741
The INSERT statement conflicted with the CHECK constraint "CHKWeatherReading_Temperature". The conflict occurred in database "Chapter7", table "Measurements.WeatherReading", column ’Temperature’.

Select all the data in the table, and you’ll see that this data never gets entered. Does this mean we have to dig through every row individually? Yes, in the current scheme. Or you could insert each row individually, which would take a lot more work for the server, but if you’ve been following along, you know we’re going to write an INSTEAD OF trigger to do this for us. First we add a table to hold the exceptions to the Temperature rule:

CREATE TABLE Measurements.WeatherReading_exception
(
    WeatherReadingId  int NOT NULL IDENTITY,
          CONSTRAINT PKWeatherReading_exception PRIMARY KEY
    ReadingTime       datetime2(3) NOT NULL,
    Temperature       float NULL
);

Then, we create the trigger:

CREATE TRIGGER Measurements.WeatherReading$InsteadOfInsertTrigger
ON Measurements.WeatherReading
INSTEAD OF INSERT AS
BEGIN
   SET NOCOUNT ON;
   SET ROWCOUNT 0; --in case the client has modified the rowcount
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   DECLARE @msg varchar(2000),    --used to hold the error message
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
           @rowsAffected int = (SELECT COUNT(*) FROM inserted);
   --      @rowsAffected int = (SELECT COUNT(*) FROM deleted);
   BEGIN TRY
          --[validation section]
          --[modification section]
          --<perform action>
           --BAD data
          INSERT Measurements.WeatherReading_exception (ReadingTime, Temperature)
          SELECT ReadingTime, Temperature
          FROM   inserted
          WHERE  NOT(Temperature BETWEEN -80 and 150);
           --GOOD data
          INSERT Measurements.WeatherReading (ReadingTime, Temperature)
          SELECT ReadingTime, Temperature
          FROM   inserted
          WHERE  (Temperature BETWEEN -80 and 150);
   END TRY
   BEGIN CATCH
              IF @@trancount > 0
                  ROLLBACK TRANSACTION;
              THROW; --will halt the batch or be caught by the caller’s catch block
     END CATCH;
END;

Now, we try to insert the rows with the bad data still in there:

INSERT  INTO Measurements.WeatherReading (ReadingTime, Temperature)
VALUES (’20160101 0:00’,82.00), (’20160101 0:01’,89.22),
       (’20160101 0:02’,600.32),(’20160101 0:03’,88.22),
       (’20160101 0:04’,99.01);
SELECT *
FROM Measurements.WeatherReading;

The good data is in the following output:

WeatherReadingId ReadingTime               Temperature
---------------- ------------------------- ----------------------
4                2016-01-01 00:00:00.000   82
5                2016-01-01 00:01:00.000   89.22
6                2016-01-01 00:03:00.000   88.22
7                2016-01-01 00:04:00.000   99.01

The nonconforming data can be seen by viewing the data in the exception table:

SELECT *
FROM   Measurements.WeatherReading_exception;

This returns the following result:

WeatherReadingId ReadingTime               Temperature
---------------- ------------------------- ----------------
1                2008-01-01 00:02:00.000   600.32

Now, it might be possible to go back and work on each exception, perhaps extrapolating the value it should have been, based on the previous and the next measurements taken:

(88.22 + 89.22) /2 = 88.72

Of course, if we did that, we would probably want to include another attribute that indicated that a reading was extrapolated rather than an actual reading from the device. This is obviously a very simplistic example, and you could even make the functionality a lot more interesting by using previous readings to determine what is reasonable.

One note about INSTEAD OF triggers. If you are doing a singleton insert, you may be inclined to use SCOPE_IDENTITY() to fetch the row that is inserted. If you have an INSTEAD OF trigger on the table, this will not give you the desired value:

INSERT  INTO Measurements.WeatherReading (ReadingTime, Temperature)
VALUES (’20160101 0:05’,93.22);
SELECT SCOPE_IDENTITY();

This returns NULL. If you need the identity value, you can use the alternate key for the row (in this case the ReadingTime), or use a SEQUENCE generator as we did in Chapter 6 for one of the tables which provides control over the value that is used.

Forcing No Action to Be Performed on a Table

Our final INSTEAD OF trigger example deals with what’s almost a security issue. Often, users have too much access, and this includes administrators who generally use sysadmin privileges to look for problems with systems. Some tables we simply don’t ever want to be modified. We might implement triggers to keep any user—even a system administrator—from changing the data.

In this example, we’re going to implement a table to hold the version of the database. It’s a single-row “table” that behaves more like a global variable. It’s here to tell the application which version of the schema to expect, so it can tell the user to upgrade or lose functionality:

CREATE SCHEMA System;
GO
CREATE TABLE System.Version
(
    DatabaseVersion varchar(10)
);
INSERT  INTO System.Version (DatabaseVersion)
VALUES (’1.0.12’);

Our application always looks to this value to see what objects it expects to be there when it uses them. We clearly don’t want this value to get modified, even if someone has db_owner rights in the database. So, we might apply an INSTEAD OF trigger:

CREATE TRIGGER System.Version$InsteadOfInsertUpdateDeleteTrigger
ON System.Version
INSTEAD OF INSERT, UPDATE, DELETE AS
BEGIN
   SET NOCOUNT ON;
   SET ROWCOUNT 0; --in case the client has modified the rowcount
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   DECLARE @msg varchar(2000),    --used to hold the error message
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
           @rowsAffected int = (SELECT COUNT(*) FROM inserted);
   IF @rowsAffected = 0 SET @rowsAffected = (SELECT COUNT(*) FROM deleted);
   --no need to complain if no rows affected
   IF @rowsAffected = 0 RETURN;
   --No error handling necessary, just the message.
   --We just put the kibosh on the action.
   THROW 50000, ’The System.Version table may not be modified in production’, 16;
END;

Attempts to delete the value, like so

UPDATE System.Version
SET    DatabaseVersion = ’1.1.1’;
GO

will result in the following:

Msg 50000, Level 16, State 16, Procedure Version$InsteadOfInsertUpdateDeleteTrigger, Line 15
The System.Version table may not be modified in production

Checking the data, you will see that it remains the same:

SELECT *
FROM   System.Version;
Returns:
DatabaseVersion
---------------
1.0.12

The administrator, when doing an upgrade, would then have to take the conscious step of running the following code:

ALTER TABLE system.version
    DISABLE TRIGGER version$InsteadOfInsertUpdateDeleteTrigger;

Now, you can run the UPDATE statement:

UPDATE System.Version
SET    DatabaseVersion = ’1.1.1’;

Check the data again,

SELECT *
FROM   System.Version;

and you will see that it has been modified:

DatabaseVersion
---------------
1.1.1

Reenable the trigger using ALTER TABLEENABLE TRIGGER:

ALTER TABLE System.Version
    ENABLE TRIGGER Version$InsteadOfInsertUpdateDeleteTrigger;

Using a trigger like this (not disabled, of course, which is something you can catch with a DDL trigger) enables you to “close the gate,” keeping the data safely in the table, even from accidental changes.

Dealing with Trigger and Constraint Errors

One important thing to consider about triggers and constraints is how you need to deal with the error-handling errors caused by constraints or triggers. One of the drawbacks to using triggers is that the state of the database after a trigger error is different from when you have a constraint error. We need to consider two situations when we do a ROLLBACK in a trigger, using an error handler such as we have in this chapter:

  • You aren’t using a TRY-CATCH block: This situation is simple. The batch stops processing in its tracks. SQL Server handles cleanup for any transaction you were in.
  • You are using a TRY-CATCH block: This situation can be a bit tricky, and will depend on what you want to occur.

Take a TRY-CATCH block, such as this one:

BEGIN TRY
        <DML STATEMENT>
END TRY
BEGIN CATCH
        <handle it>
END CATCH;

If the T-SQL trigger rolls back and an error is raised, when you get to the <handle it> block, you won’t be in a transaction. For the rare use of CLR triggers, you’re in charge of whether the connection ends. When a CHECK constraint causes the error or executes a simple THROW or RAISERROR, you’ll be in a transaction. Generically, here’s the CATCH block that I use (as I have used in the triggers written so far in this chapter):

     BEGIN CATCH
              IF @@trancount > 0
                  ROLLBACK TRANSACTION;
              THROW; --will halt the batch or be caught by the caller’s catch block
     END CATCH;

In almost every case, I roll back any transaction, log the error, and then reraise the error. It is simpler to do as a rule, and is 99.997% of the time what will be desired. In order to show the different scenarios that can occur, I will build the following abstract tables for demonstrating trigger and constraint error handling:

CREATE SCHEMA alt;
GO
CREATE TABLE alt.errorHandlingTest
(
    errorHandlingTestId   int CONSTRAINT PKerrorHandlingTest PRIMARY KEY,
    CONSTRAINT CHKerrorHandlingTest_errorHandlingTestId_greaterThanZero
           CHECK (errorHandlingTestId > 0)
);

Note that if you try to put a value greater than 0 into the errorHandlingTestId, it will cause a constraint error. In the trigger, the only statement we will implement in the TRY section will be to raise an error. So no matter what input is sent to the table, it will be discarded and an error will be raised and, as we have done previously, we will use ROLLBACK if there is a transaction in progress and then do a THROW.

CREATE TRIGGER alt.errorHandlingTest$insertTrigger
ON alt.errorHandlingTest
AFTER INSERT
AS
    BEGIN TRY
        THROW 50000, ’Test Error’,16;
    END TRY
    BEGIN CATCH
         IF @@TRANCOUNT > 0
                ROLLBACK TRANSACTION;
         THROW;
    END CATCH;

The first thing to understand is that when a normal constraint causes the DML operation to fail, the batch will continue to operate:

--NO Transaction, Constraint Error
INSERT alt.errorHandlingTest
VALUES (-1);
SELECT ’continues’;

You will see that the error is raised, and then the SELECT statement is executed:

Msg 547, Level 16, State 0, Line 913
The INSERT statement conflicted with the CHECK constraint "CHKerrorHandlingTest_errorHandlingTestId_greaterThanZero". The conflict occurred in database "Chapter7", table "alt.errorHandlingTest", column ’errorHandlingTestId’.
The statement has been terminated.
---------
continues

However, do this with a trigger error:

INSERT alt.errorHandlingTest
VALUES (1);
SELECT ’continues’;

This returns the following and does not get to the SELECT ’continues’ line at all:

Msg 50000, Level 16, State 16, Procedure errorHandlingTest$afterInsertTrigger, Line 6
Test Error

This fairly elegant stoppage occurs using THROW because THROW stops the batch. However, using RAISERROR the batch will still stop, but it will give you a message about stopping the trigger. As I will show, if you just THROW an error in the trigger, it will still end up ending as well, in a far less elegant manner.

There are also differences in dealing with errors from constraints and triggers when you are using TRY-CATCH and transactions. Take the following batch. The error will be a constraint type. The big thing to understand is the state of a transaction after the error. This is definitely an issue that you have to be careful with.

BEGIN TRY
    BEGIN TRANSACTION
    INSERT alt.errorHandlingTest
    VALUES (-1);
    COMMIT;
END TRY
BEGIN CATCH
    SELECT  CASE XACT_STATE()
                WHEN 1 THEN ’Committable’
                WHEN 0 THEN ’No transaction’
                ELSE ’Uncommitable tran’ END as XACT_STATE
            ,ERROR_NUMBER() AS ErrorNumber
            ,ERROR_MESSAGE() as ErrorMessage;
    IF @@TRANCOUNT > 0
          ROLLBACK TRANSACTION;
END CATCH;

This returns the following:

XACT_STATE        ErrorNumber ErrorMessage
----------------- ----------- ----------------------------------------------------------
Committable       547         The INSERT statement conflicted with the CHECK constraint...

The transaction is still in force and in a stable state. If you wanted to continue on in the batch doing whatever you need to do, it is certainly fine to do so. However, if you end up using any triggers to enforce data integrity, the situation will be different (and not entirely obvious to the programmer). In the next batch, we will use 1 as the value, so we get a trigger error instead of a constraint one:

BEGIN TRANSACTION
   BEGIN TRY
        INSERT alt.errorHandlingTest
        VALUES (1);
        COMMIT TRANSACTION;
   END TRY
BEGIN CATCH
    SELECT  CASE XACT_STATE()
                WHEN 1 THEN ’Committable’
                WHEN 0 THEN ’No transaction’
                ELSE ’Uncommitable tran’ END as XACT_STATE
            ,ERROR_NUMBER() AS ErrorNumber
            ,ERROR_MESSAGE() as ErrorMessage;
    IF @@TRANCOUNT > 0
          ROLLBACK TRANSACTION;
END CATCH;

This returns the following:

XACT_STATE        ErrorNumber ErrorMessage
----------------- ----------- --------------------------------------------
No transaction    50000       Test Error

In the error handler of our batch, the session is no longer in a transaction, since we rolled the transaction back in the trigger. However, unlike the case without an error handler, we continue on in the batch rather than the batch dying. Note, however, that there is no way to recover from an issue in a trigger without resorting to trickery (like storing status in a temporary variable table instead of throwing an error or rolling back, but this is highly discouraged to keep coding standard and easy to follow).

The unpredictability of the transaction state is why we check the @@TRANCOUNT to see if we need to do a rollback. In this case, the error message in the trigger was bubbled up into this CATCH statement, so we are in an error state that is handled by the CATCH BLOCK.

As a final demonstration, let’s look at one other case, and that is where you raise an error in a trigger without rolling back the transaction:

ALTER TRIGGER alt.errorHandlingTest$insertTrigger
ON alt.errorHandlingTest
AFTER INSERT
AS
    BEGIN TRY
          THROW 50000, ’Test Error’,16;
    END TRY
    BEGIN CATCH
         --Commented out for test purposes
         --IF @@TRANCOUNT > 0
         --    ROLLBACK TRANSACTION;
         THROW;
    END CATCH;

Now, causing an error in the trigger:

BEGIN TRY
    BEGIN TRANSACTION
    INSERT alt.errorHandlingTest
    VALUES (1);
    COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    SELECT  CASE XACT_STATE()
                WHEN 1 THEN ’Committable’
                WHEN 0 THEN ’No transaction’
                ELSE ’Uncommitable tran’ END as XACT_STATE
            ,ERROR_NUMBER() AS ErrorNumber
            ,ERROR_MESSAGE() as ErrorMessage;
     IF @@TRANCOUNT > 0
          ROLLBACK TRANSACTION;
END CATCH;

The result will be as follows:

XACT_STATE          ErrorNumber ErrorMessage
------------------- ----------- --------------------------------------------
Uncommitable tran   50000       Test Error

You get an uncommittable transaction, which is also referred to as a doomed transaction. An uncommittable transaction is still in force but can never be committed and must eventually be rolled back.

The point to all of this is that you need to be careful when you code your error handling to do a few things:

  • Keep things simple: Do only as much handling as you need, and generally treat errors as unrecoverable unless recovery is truly necessary. The key is to deal with the errors and get back out to a steady state so that the client can know what to try again.
  • Keep things standard: Set a standard, and follow it. Always use the same handler for all your code in all cases where it needs to do the same things.
  • Test well: The most important bit of information is to test and test again all the possible paths your code can take.

To always get a consistent situation in my code, I pretty much always use a standard handler. Basically, before every data manipulation statement, I set a manual message in a variable, use it as the first half of the message to know what was being executed, and then append the system message to know what went wrong, sometimes using a constraint mapping function as mentioned earlier, although usually that is overkill since the UI traps all errors:

BEGIN TRY
    BEGIN TRANSACTION;
    DECLARE @errorMessage nvarchar(4000) = ’Error inserting data into alt.errorHandlingTest’;
    INSERT alt.errorHandlingTest
    VALUES (-1);
    COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
    --I also add in the stored procedure or trigger where the error
    --occurred also when in a coded object
    SET @errorMessage = CONCAT( COALESCE(@errorMessage,’’), ’ ( System Error: ’,
                                ERROR_NUMBER(),’:’,ERROR_MESSAGE(),
                                ’ : Line Number:’,ERROR_LINE());
        THROW 50000,@errorMessage,16;
END CATCH;

Now, this returns the following:

Msg 50000, Level 16, State 16, Line 18
Error inserting data into alt.errorHandlingTest ( System Error: 547:The INSERT statement conflicted with the CHECK constraint "chkAlt_errorHandlingTest_errorHandlingTestId_greaterThanZero".
The conflict occurred in database "Chapter7", table "alt.errorHandlingTest", column ’errorHandlingTestId’: Line Number:4)

This returns the manually created message and the system message, as well as where the error occurred. In Appendix B, I outline some additional methods you might take to log errors, using a stored procedure I call ErrorHandling.ErrorLog$insert, depending on whether the error is something that you expected to occur on occasion or is something (as I said about triggers) that really shouldn’t happen. If I was implementing the code in such a way that I expected errors to occur, I might also include a call to something like the ErrorHandling.ErrorMap$MapError procedure that was discussed earlier to beautify the error message value for the system error.

Error handling is definitely a place where SQL Server’s T-SQL language lacks in comparison to almost any other language, but took leaps of improvement in 2005 with TRY…CATCH, with a few improvements continued in 2012 with the ability to rethrow an error using THROW, which we have used in the standard trigger template. Of course, the most important part of writing error-handing code is the testing you do to make sure that it works!

Best Practices

The main best practice is to use the right tool for the job. There are many tools in (and around) SQL to use to protect the data. Picking the right tool for a given situation is essential. For example, every column in every table could be defined as nvarchar(max). Using CHECK constraints, you could then constrain the values to look like almost any datatype. It sounds silly perhaps, but it is possible. But you know better after reading Chapter 6 and now this chapter, right?

When choosing your method of protecting data, it’s best to apply the following types of objects, in this order:

  • Datatypes: Choosing the right type is the first line of defense. If all your values need to be integers between 1 and 10,000, just using an integer datatype takes care of one part of the rule immediately.
  • Defaults: Though you might not think defaults can be considered data-protection resources, you should know that you can use them to automatically set columns where the purpose of the column might not be apparent to the user (and the database adds a suitable value for the column).
  • Simple CHECK constraints: These are important in ensuring that your data is within specifications. You can use almost any scalar functions (user-defined or system), as long as you end up with a single logical expression.
  • Complex CHECK constraints, possibly using functions: These can be very interesting parts of a design but should be used sparingly, and you should rarely use a function that references the same table’s data due to inconsistent results.
  • Triggers: These are used to enforce rules that are too complex for CHECK constraints. Triggers allow you to build pieces of code that fire automatically on any INSERT, UPDATE, and DELETE operation that’s executed against a single table.

Don’t be afraid to enforce rules in more than one location. Although having rules as close to the data storage as possible is essential to trusting the integrity of the data when you use the data, there’s no reason why the user needs to suffer through a poor user interface with a bunch of simple text boxes with no validation.

Or course, not all data protection can be done at the object level, and some will need to be managed using client code or even asynchronous processes that execute long after the data has been entered. The major difference between user code and the methods we have discussed so far in the book is that SQL Server–based enforced integrity is automatic and cannot (accidentally) be overridden. It also is far better in terms of dealing with concurrent users making frequent changes to data.

Summary

Now, you’ve finished the task of developing the data storage for your databases. If you’ve planned out your data storage and data protection layer, the only bad data that can get into your system has nothing to do with the design (if a user wants to type the name John as “Jahn” or even “IdiotWhoInsultedMe”—stranger things have happened!—there’s nothing that can be done in the database server to prevent it). As an architect or programmer, you can’t possibly stop users from putting the names of pieces of equipment in a table named Employee. There’s no semantic checking built in, and it would be impossible to do so without tremendous work and tremendous computing power. Only education can take care of this. Of course, it helps if you’ve given the users tables to store all their data, but still, users will be users.

The most we can do in SQL Server is to make sure that data is fundamentally sound, such that the data minimally makes sense without knowledge of decisions that were made by the users that, regardless of whether they are correct, are legal values. If your HR employees keep trying to pay your new programmers minimum wage, the database likely won’t care, but if they try to say that new employees make a negative salary, actually owing the company money for the privilege to come to work, well, that is probably not going to fly, even if the job is video game tester or some other highly desirable occupation. During this process, we used the resources that SQL Server gives you to protect your data from having such invalid values that would have to be checked for again later.

Use check constraints to protect data in a single row. You can access any column in the table, but only the data in that one row. You can access data in other tables using user-defined functions, if you so desire, but note that just like building your own foreign key checking, you need to consider what happens in the table you are checking, and the table you are referencing. Use triggers to do most complex checking, especially when you need to reference rows in the same table, like checking balances. Triggers can also introduce side effects, like maintaining denormalized data if it is needed, or even calling other procedures like sp_db_sendmail. The most important part of writing a trigger is understanding that it executes once per DML execution, so once no matter if 1 row was affected, or 1 billion rows. Plan ahead.

Once you’ve built and implemented a set of appropriate data-safeguarding resources, you can then trust that the data in your database has been validated. You should never need to revalidate keys or values in your data once it’s stored in your database, but it’s a good idea to do random sampling, so you know that no integrity gaps have slipped by you, especially during the full testing process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.253.223