CHAPTER 13

image

Architecting Your System

Men’s arguments often prove nothing but their wishes.

Charles Caleb Colton, English cleric, writer, and collector

So far in the book, we started with how one should design a database without regard for technology, then covered how one might implement that database using SQL Server. So far the process has been one of building one precept upon another. The one topic we really haven’t breached is data access. But as SQL Server grows in complexity, we are given more and more choices. It is essential for the architect who is creating a new system, or even changing an existing system, to think about the architectural building blocks and make sure that a number of things are matched up:

  • Technical needs: What does your application really need? Amazing throughput? Massive data quantities? Extreme data integrity?
  • Human abilities: My personal feeling is to never limit what tech you use to the abilities of the programmers, but not everyone can do everything. Plus, some methods of architecting the code layer for a system are more time consuming, no matter the programmer.
  • Tools: What tools will the programmers who are building the front end be using? Very few data architects will smile happily when the three letters ORM are uttered, but object-relational mapping tools are popular.

In this chapter I will provide a rather brief overview of some of the architectural decisions you need to make, along with the pros and cons of each—in other words, “opinions” (technical-based opinions, but opinions nevertheless). Regardless of whether your application is a good, old-fashioned, client-server application, a multitier web application, uses an ORM, or uses some new application pattern that hasn’t been created (or I haven’t heard of!) as I write this, data must be stored in and retrieved from tables. Most of the advice presented in this chapter will be relevant regardless of the type of application you’re building.

In this chapter, I will present a number of my own opinions on how to use stored procedures, ad hoc SQL, and the CLR. Each of these opinions is based on years of experience working with SQL Server technologies, but I am not so set in my ways that I cannot see the point of the people on the other side of any fence—anyone whose mind cannot be changed is no longer learning. So if you disagree with this chapter, feel free to e-mail me at [email protected]; you won’t hurt my feelings and we will both probably end up learning something in the process.

In this chapter, I am going to discuss the following topics:

  • Choosing the engine for your needs: I will give a brief overview of how the integration between the two engines works, and how to decide when to use the in-memory model.
  • Using ad hoc SQL: Formulating queries in the application’s presentation and manipulation layer (typically functional code stored in objects, such as .NET or Java, and run on a server or a client machine).
  • Using stored procedures: Creating an interface between the presentation/manipulation layer and the data layer of the application. Note that views and functions, as well as procedures, also form part of this data-access interface. You can use all three of these object types.
  • Using the CLR in T-SQL: I will present some basic opinions on the usage of the CLR within the realm of T-SQL.

Each section will analyze some of the pros and cons of each approach, in terms of flexibility, security, performance, and so on. Along the way, I’ll offer some personal opinions on optimal architecture and give advice on how best to implement both types of access. Again, bear in mind that a lot of this is just my personal opinion.

Image Note  I previously mentioned object-relational mapping tools (examples of which include Hibernate, Spring, or even the ADO.NET Entity Framework), so you might be thinking that I’m going to cover them in this chapter. In the end, however, these tools are really using ad hoc access, in that they are generating SQL. For the sake of this book, they should be lumped into the ad hoc group, unless they are used with stored procedures (which is pretty rare).

The most difficult part of a discussion of this sort is that the actual arguments that go on are not so much about right and wrong, but rather the question of which method is easier to program and maintain. SQL Server and Visual Studio .NET give you lots of handy-dandy tools to build your applications, mapping objects to data, and as the years pass, this becomes even truer with the Entity Framework and many of the ORM tools out there.

The problem is that these tools don’t always take enough advantage of SQL Server’s best practices to build applications in their most common form of usage. Doing things in a best practice manner would mean doing a lot of coding manually, without the ease of automated tools to help you. Some organizations do this manual work with great results, but such work is rarely going to be popular with developers who have never had to support an application that is extremely hard to optimize once the system is in production.

A point that I really should make clear is that I feel that the choice of data-access strategy shouldn’t be linked to the methods used for data validation nor should it be linked to whether you use (or how much you use) check constraints, triggers, and suchlike. If you have read the entire book, you should be kind of tired of hearing how much I feel that you should do every possible data validation on the SQL Server data that can be done without making a maintenance nightmare. Fundamental data rules that are cast in stone should be done on the database server in constraints and triggers at all times so that these rules can be trusted by the user (for example, an ETL process). On the other hand, procedures or client code is going to be used to enforce a lot of the same rules, plus all of the mutable business rules too, but in either situation, non-data tier rules can be easily circumvented by using a different access path. Even database rules can be circumvented using bulk loading operations, so be careful there too.

Image Note  While I stand by all of the concepts and opinions in this entire book (typos not withstanding), I definitely do not suggest that your educational journey end here. Please read other people’s work, try out everything, and form your own opinions. If some day you end up writing a competitive book to mine, the worst thing that happens is that people have another resource to turn to.

Choosing the Engine for Your Needs

Throughout the latter half of the book, you have seen some discussion of the in-memory OLTP tables, and how they are different (and how they are similar). However, we haven’t talked too much about the overall architecture, sticking mostly to the ways the in-memory OLTP tables themselves have worked when used in a “normal” T-SQL statement. In this section I want to discuss the architecture a bit more, to help you get a feel for whether or not it is something you should use for systems you are building. In the rest of this chapter, I will make only mention of the differences when they are considerable the discussion of data access.

When we talk about the in-memory engine, it is basically a complete, complementary storage and query-processing system. Data is stored in the same database along with on-disk data, but in a completely different format. All data you want to store in these tables must fit within RAM, including (as discussed in Chapter 11) versions of rows that are being changed. Logging is different, in that your indexes are never logged, and you can have completely unlogged (not durable) tables. Microsoft has made it so that using the in-memory model is super easy, other than a few things:

  • You generally need at least double the RAM that you have data: For version data, indexes, etc. Unless you have small data sets, this could be tough for a lot of organizations.
  • The concurrency model: All concurrency is implemented using a snapshot mentality. After you start a transaction and it uses data in the in-memory container, you cannot see any other user’s changes.
  • Intra-table/intra-row: data integrity is more difficult: Unless you can do your data integrity checks using foreign keys, uniqueness constraints, it can be difficult to check data because of the concurrency model, particularly if lots of data is changing in a lot of tables.

Fortunately, the logical model of our database needn’t change that much to use the in-memory model, and neither do our tables for the most part. Even better for a designer, other than a few data integrity concerns, there is no real difficulty with having a few tables in each engine. An example could be

  • Sales order tables in in-memory tables: We need massive throughput in these tables, with hundreds of orders per second coming in.
  • Customer data in on-disk tables: Data changes some, but many of the orders come from repeat customers. You could implement business rules that customers are not deleted (at least not using a simple DELETE statement, but perhaps as part of some cleanup processes that are carefully created). The customer identifiers are cached in the code when the customer logs in, and read contention is far less of a concern than write contention.

Interaction between the engines is seamless for the programmer. As we have seen throughout the book when accessing in-memory tables, we simply execute a normal SQL statement. This uses what is called query “interop” mode, letting typical interpreted T-SQL the data in in-memory structures.

You can see large gains in performance simply by using interop mode, but where the engine truly shines is when you use natively compiled objects. While they are written using T-SQL (a heavily limited version of T-SQL, but it looks just like T-SQL with a few new settings), they are compiled into binary code like a typical C# program (just like the in-memory tables). This gives you all of the benefits of a compiled language coupled with the benefits of the relational engine.

My goal is to just give you a few guidelines on when to choose to use the in-memory engine. Microsoft provides its own list at https://msdn.microsoft.com/en-us/library/dn133186.aspx, and since this is such a new engine (the 2014 version was extremely limited, whereas the 2016 version has become far more usable), Microsoft may expand the expected use cases before I write a new edition of this book. My list of characteristics for potential utilization includes

  • High concurrency needs with a low chance of collision: When you have the need to load, modify, etc. many rows constantly/continuously, but there is little chance that two connections will contend with another. The way concurrency collisions are resolved will be costlier than short-term blocks.
  • Limited need for transaction isolation/short transactions: The longer the transactions your system needs to make, the more complicated things will get, not just due to collisions, but also because versions of rows are held so every connection gets their consistent view.
  • Limited search criteria that needs optimized: With only 8 indexes per table, you likely can’t have 20 different ways you want to search for data in an optimum manner.
  • Limited data integrity needs: As all transactions are in a snapshot-based transaction, you can’t see any external data until you commit (and the checks for serializable transactions are scans of table or indexes).

Basically speaking, the best candidates are the tables in a strict OLTP workload with the highest concurrent usage. But memory-optimized tables and natively compiled objects are not a way to fix poorly written code. If you have bad code, this could easily make it worse. Your applications probably need at least some change to ensure in-memory works well with your applications. As in all things, be sure and test out what works, and what works really well. We have covered many of the limitations and differences between the two engines, but there is more to read and learn than I have covered, particularly for management of these processes.

Ad Hoc SQL

Ad hoc SQL is sometimes referred to as “straight SQL” and generally refers to the formulation of SELECT, INSERT, UPDATE, and DELETE (as well as any other) statements textually in the client. These statements are then sent to SQL Server either individually or in batches of multiple statements to be syntax checked, compiled, optimized (producing a plan), and executed. SQL Server may use a cached plan from a previous execution, but it will have to pretty much exactly match the text of one call to another to do so (the only difference can be some parameterization of literals, which we will discuss a little later in the chapter). You can also compile your ad hoc SQL into a temporary plan.

I will make no distinction between ad hoc calls that are generated manually and those that use a middleware setup like any ORM: from SQL Server’s standpoint, a string of characters is sent to the server and interpreted at runtime. So whether your method of generating these statements is good or poor is of no concern to me in this discussion, as long as the SQL generated is well formed and protected from users’ malicious actions. (For example, injection attacks are generally the biggest offender. The reason I don’t care where the ad hoc statements come from is that the advantages and disadvantages for the database support professionals are pretty much the same, and in fact, statements generated from a middleware tool can be worse, because you may not be able to change the format or makeup of the statements, leaving you with no easy way to tune statements, even if you can modify the source code.)

Sending queries as strings of text is the way that plenty of tools tend to converse with SQL Server, and is, for example, how SQL Server Management Studio does its interaction with the server metadata. If you have never used Extended Events to watch the SQL that any of the management and development tools use, you should; just don’t use it as your guide for building your OLTP system. It is, however, a good way to learn where some bits of metadata that you can’t figure out come from.

There’s no question that users will perform some ad hoc queries against your system, especially when you simply want to write a query and execute it just once. However, the more pertinent question is: should you be using ad hoc SQL when building the permanent interface to an OLTP system’s data?

Image Note  This topic doesn’t include ad hoc SQL statements executed from stored procedures (commonly called dynamic SQL), which I’ll discuss in the section “Stored Procedures.”

Advantages

Using uncompiled ad hoc SQL has the following advantages over building compiled stored procedures:

  • Runtime control over queries: Queries are built at runtime, without having to know every possible query that might be executed. This can lead to better performance as queries can be formed at runtime; you can retrieve only necessary data for SELECT queries or modify data that’s changed for UPDATE operations.
  • Flexibility over shared plans and parameterization: Because you have control over the queries, you can more easily build queries at runtime that use the same plans and even can be parameterized as desired, based on the situation.

Runtime Control over Queries

Unlike stored procedures, which are prebuilt and stored in the SQL Server system tables, ad hoc SQL is formed at the time it’s needed: at runtime. Hence, it doesn’t suffer from some of the inflexible requirements of stored procedures. For example, say you want to build a user interface to a list of customers. You can add several columns to the SELECT clause, based on the tables listed in the FROM clause. It’s simple to build a list of columns into the user interface that the user can use to customize his or her own list. Then the program can issue the list request with only the columns in the SELECT list that are requested by the user. Because some columns might be large and contain quite a bit of data, it’s better to send back only the columns that the user really desires instead of a bunch of columns the user doesn’t care about.

For instance, consider that you have the following table to document contacts to prospective customers (it’s barebones for this example). In each query, you might return the primary key but show or not show it to the user based on whether the primary key is implemented as a surrogate or natural key—it isn’t important to our example either way. You can create this table in any database you like. In the sample code, I’ve created a database named Chapter13 that will allow in-memory tables by default.

CREATE SCHEMA Sales;
GO
CREATE TABLE Sales.Contact
(
    ContactId   int CONSTRAINT PKContact PRIMARY KEY,
    FirstName   varchar(30),
    LastName    varchar(30),
    CompanyName varchar(100),
    SalesLevelId  int, --real table would implement as a foreign key
    ContactNotes  varchar(max),
    CONSTRAINT AKContact UNIQUE (FirstName, LastName, CompanyName)
);
--a few rows to show some output from queries
INSERT INTO Sales.Contact
            (ContactId, FirstName, Lastname, CompanyName, SaleslevelId, ContactNotes)
VALUES( 1,’Drue’,’Karry’,’SeeBeeEss’,1,
           REPLICATE (’Blah...’,10) + ’Called and discussed new ideas’),
      ( 2,’Jon’,’Rettre’,’Daughter Inc’,2,
           REPLICATE (’Yada...’,10) + ’Called, but he had passed on’);

One user might want to see the person’s name and the company, plus the end of the ContactNotes, in his or her view of the data:

SELECT  ContactId, FirstName, LastName, CompanyName,
        RIGHT(ContactNotes,30) as NotesEnd
FROM    Sales.Contact;

So something like:

ContactId   FirstName  LastName  CompanyName    NotesEnd
----------- ---------- --------- -------------- ----------------------------------
1           Drue       Karry     SeeBeeEss      Called and discussed new ideas
2           Jon        Rettre    Daughter Inc   ..Called, but he had passed on

Another user might want (or need) to see less:

SELECT ContactId, FirstName, LastName, CompanyName
FROM Sales.Contact;

Which returns

ContactId   FirstName   LastName   CompanyName
----------- ----------- ---------- ---------------
1           Drue        Karry      SeeBeeEss
2           Jon         Rettre     Daughter Inc

And yet another user may want to see all columns in the table, plus maybe some additional information. Allowing the user to choose the columns for output can be useful. Consider how the file-listing dialog works in Windows, as shown in Figure 13-1.

9781484219720_13_Fig1.jpg

Figure 13-1. The Windows file-listing dialog

You can see as many or as few of the attributes of a file in the list as you like, based on some metadata you set on the directory. This is a useful method of letting the users choose what they want to see. Let’s take this one step further. Consider that the Contact table is then related to a table that tells us if a contact has purchased something:

CREATE TABLE Sales.Purchase
(
    PurchaseId int CONSTRAINT PKPurchase PRIMARY KEY,
    Amount      numeric(10,2),
    PurchaseDate date,
    ContactId   int
        CONSTRAINT FKContact$hasPurchasesIn$Sales_Purchase
            REFERENCES Sales.Contact(ContactId)
);
INSERT INTO Sales.Purchase(PurchaseId, Amount, PurchaseDate, ContactId)
VALUES (1,100.00,’2016-05-12’,1),(2,200.00,’2016-05-10’,1),
       (3,100.00,’2016-05-12’,2),(4,300.00,’2016-05-12’,1),
       (5,100.00,’2016-04-11’,1),(6,5500.00,’2016-05-14’,2),
       (7,100.00,’2016-04-01’,1),(8,1020.00,’2016-06-03’,2);

Now consider that you want to calculate the sales totals and dates for the contact and add these columns to the allowed pool of choices. By tailoring the output when transmitting the results of the query back to the user, you can save bandwidth, CPU, and disk I/O. As I’ve stressed, values such as this should usually be calculated rather than stored, especially when working on an OLTP system.

In this case, consider the following two possibilities. If the user asks for a sales summary column, the client will send the whole query:

SELECT  Contact.ContactId, Contact.FirstName, Contact.LastName
        ,Sales.YearToDateSales, Sales.LastSaleDate
FROM   Sales.Contact as Contact
          LEFT OUTER JOIN
             (SELECT ContactId,
                     SUM(Amount) AS YearToDateSales,
                     MAX(PurchaseDate) AS LastSaleDate
              FROM   Sales.Purchase
              WHERE  PurchaseDate >= --the first day of the current year
                        DATEADD(day, 0, DATEDIFF(day, 0, SYSDATETIME() )
                          - DATEPART(dayofyear,SYSDATETIME() ) + 1)
              GROUP  by ContactId) AS sales
              ON Contact.ContactId = Sales.ContactId
WHERE   Contact.LastName like ’Rett%’;

This returns the following:

ContactId   FirstName  LastName   YearToDateSales  LastSaleDate
----------- ---------- ---------- ---------------- ------------
2           Jon        Rettre     6620.00          2016-06-03

If the user doesn’t ask for a sales summary column, the client will send only the code that is not commented out in the following query:

SELECT  Contact.ContactId, Contact.FirstName, Contact.LastName
        --,Sales.YearToDateSales, Sales.LastSaleDate
FROM   Sales.Contact as Contact
          --LEFT OUTER JOIN
          --   (SELECT ContactId,
          --           SUM(Amount) AS YearToDateSales,
          --           MAX(PurchaseDate) AS LastSaleDate
          --    FROM   Sales.Purchase
          --    WHERE  PurchaseDate >= --the first day of the current year
          --              DATEADD(day, 0, DATEDIFF(day, 0, SYSDATETIME() )
          --                - DATEPART(dayofyear,SYSDATETIME() ) + 1)
          --    GROUP  by ContactId) AS sales
          --    ON Contact.ContactId = Sales.ContactId
WHERE   Contact.LastName like ’Karr%’;

This returns only the following:

ContactId   FirstName    LastName
----------- ------------ ------------
1           Drue         Karry

Not wasting the resources to do calculations that aren’t needed can save a lot of system resources if the aggregates in the derived table were very costly to execute. Even if it isn’t terribly costly, it is still a waste of resources.

In the same vein, when using ad hoc calls, it’s trivial (from a SQL standpoint) to build UPDATE statements that include only the columns that have changed in the set lists, rather than updating all columns, as can be necessary for a stored procedure. For example, take the customer columns from earlier: CustomerId, Name, and Number. You could just update all columns:

UPDATE Sales.Contact
SET    FirstName = ’Drew’,
       LastName = ’Carey’,
       SalesLevelId = 1, --no change
       CompanyName = ’CBS’,
       ContactNotes = ’Blah...Blah...Blah...Blah...Blah...Blah...Blah...Blah...Blah...’         
                      + ’Blah...Called and discussed new ideas’ --no change
WHERE ContactId = 1;

But what if only the FirstName and LastName values change? What if the Company column is part of an index, and it has data validations that take three seconds to execute? How do you deal with varchar(max) columns (or other long types)? Say the ContactNotes columns for the row with ContactId = 1 contain 300MB. Execution could take far more time than is desirable if the application passes the entire value back and forth each time. Using ad hoc SQL, to update the FirstName and LastName columns only (which probably has less overhead than changing the SalesLevelId value that minimally has a foreign key), you can simply execute the following code:

UPDATE Sales.Contact
SET    FirstName = ’John’,
       LastName = ’Ritter’
WHERE  ContactId = 2;

Some of this can be done with dynamic SQL calls built into the stored procedure, but it’s far easier to know if data changed right at the source where the data is being edited, rather than having to check the data beforehand. For example, you could have every data-bound control implement a “data changed” property, and perform a column update only when the original value doesn’t match the value currently displayed. In a stored-procedure-only architecture, having multiple update procedures is not necessarily out of the question, particularly when it is very costly to modify a given column.

One place where using ad hoc SQL can produce more reasonable code is in the area of optional parameters. Say that, in your query to the Sales.Contact table, your UI allowed you to filter on either FirstName, LastName, or both. For example, take the following code to filter on both FirstName and LastName:

SELECT FirstName, LastName, CompanyName
FROM   Sales.Contact
WHERE  FirstName LIKE ’J%’
  AND  LastName LIKE  ’R%’;

What if the user only needed to filter by last name? Sending the ’%’ wildcard for FirstName can cause code to perform less than adequately, especially when the query is parameterized. (I’ll cover query parameterization in the next section, “Performance.”)

SELECT FirstName, LastName, CompanyName
FROM   Sales.Contact
WHERE  FirstName LIKE ’%’
  AND  LastName LIKE ’Carey%’;

If you think this looks like a very silly query to execute, you are generally correct. If you were writing this query to be used repeatedly, you would write the more logical version of this query, without the superfluous condition:

SELECT FirstName, LastName, CompanyName
FROM   Sales.Contact
WHERE  LastName LIKE ’Carey%’;

This doesn’t require any difficult coding. Just remove one of the criteria from the WHERE clause, and the optimizer needn’t consider the other. What if you want to OR the criteria instead? Simply build the query with OR instead of AND. This kind of flexibility is one of the biggest positives to using ad hoc SQL calls.

Image Note  The ability to change the statement programmatically will also play to the downside of any dynamically built statement. With just two parameters, we have three possible variants of the statement to be used, so we have to consider performance for all three when we are building our test cases.

For a stored procedure, you might need to write code that functionally works in a manner such as the following:

IF @FirstNameValue <> ’%’
        SELECT FirstName, LastName, CompanyName
        FROM   Sales.Contact
        WHERE  FirstName LIKE @FirstNameLike
          AND  LastName LIKE @LastNameLike;
ELSE
        SELECT FirstName, LastName, CompanyName
        FROM   Sales.Contact
        WHERE  FirstName LIKE @FirstNameLike;

This is generally a bad idea, as you ideally want to have every query in a procedure to execute. Even worse, you can do something messy like the following in your WHERE clause so if any value is passed in, it uses it, or uses ’%’ otherwise:

WHERE  FirstName LIKE  ISNULL(NULLIF(LTRIM (@FirstNameLike) +’%’,’%’),FirstName)
  AND LastName LIKE ISNULL(NULLIF(LTRIM(@LastNameLike) +’%’,’%’),LastName)

Unfortunately though, this often does not optimize very well (particularly with a lot of data) because the optimizer has a hard time optimizing for factors that can change based on different values of a variable—leading to the need for the branching solution mentioned previously to optimize for specific parameter cases. A better way to do this with stored procedures might be to create two stored procedures—one with the first query and another with the second query—especially if you need extremely high-performance access to the data. You’d change this to the following code:

IF @FirstNameValue <> ’%’
        EXECUTE Sales.Contact$Get @FirstNameLike, @LastNameLike;
ELSE
        EXECUTE Sales.Contact$GetByLastNameLike @LastNameLike;

You can do some of this kind of ad hoc SQL writing using dynamic SQL in stored procedures. However, you might have to do a good bit of these sorts of IF blocks to arrive at which parameters aren’t applicable in various datatypes. Because a UI can know which parameters are applicable, handling this situation using ad hoc SQL can be far easier. Getting this kind of flexibility is the main reason that I use an ad hoc SQL call in an application (usually embedded in a stored procedure): I can omit parts of queries that don’t make sense in some cases, and it’s easier to avoid executing unnecessary code.

Flexibility over Shared Plans and Parameterization

Queries formed at runtime, using proper techniques, can actually be better for performance in many ways than using stored procedures. Because you have control over the queries, you can more easily build queries at runtime that use the same plans, and even can be parameterized as desired, based on the situation.

This is not to say that it is the most favorable way of implementing parameterization. (If you want to know the whole picture you have to read the whole section on ad hoc and stored procedures.) However, the fact is that ad hoc access tends to get a bad reputation for something that Microsoft fixed many versions back. In the following sections, “Shared Execution Plans” and “Parameterization,” I will take a look at the good points and the caveats you will deal with when building ad hoc queries and executing them on the server.

Shared Execution Plans

The age-old reason that people used stored procedures was because the query processor cached their plans. Every time you executed a procedure, you didn’t have to decide the best way to execute the query. As of SQL Server 7.0 (which was released in 1998!), cached plans were extended to include ad hoc SQL. However, the standard for what can be cached is pretty strict. For two calls to the server to use the same plan, the statements that are sent must be identical, except possibly for the literal values in search arguments. Identical means identical; add a comment, change the case, or even add a space character, and the plan will no longer match. SQL Server can build query plans that have parameters, which allow plan reuse by subsequent calls. However, overall, stored procedures are better when it comes to using cached plans for performance, primarily because the matching and parameterization are easier for the optimizer to do, since it can be done by object_id, rather than having to match larger blobs of text.

A fairly major caveat is that for ad hoc queries to use the same plan, they must be exactly the same, other than any values that can be parameterized. For example, consider the following two queries. (I’m using WideWorldImporters tables for this example, as that database has a nice amount of data to work with.)

USE WideWorldImporters;
GO
SELECT People.FullName, Orders.OrderDate
FROM   Sales.Orders
                 JOIN Application.People
                        ON Orders.ContactPersonID = People.PersonID
WHERE  People.FullName = N’Bala Dixit’;

Next, run the following query. See whether you can spot the difference between the two queries.

SELECT People.FullName, Orders.OrderDate
FROM   Sales.Orders
                 JOIN Application.People
                        on Orders.ContactPersonID = People.PersonID
WHERE  People.FullName = N’Bala Dixit’;

These queries can’t share plans because ON in the first query’s FROM clause is uppercase and on in the second query’s FROM clause is lowercase. Using the sys.dm_exec_query_stats DMV, you can see that the case difference does cause two plans by running

SELECT  *
FROM    (SELECT qs.execution_count,
                SUBSTRING(st.text, (qs.statement_start_offset / 2) + 1,
                                ((CASE qs.statement_end_offset
                       WHEN -1 THEN DATALENGTH(st.text)
                       ELSE qs.statement_end_offset
                  END - qs.statement_start_offset) / 2) + 1) AS statement_text
         FROM   sys.dm_exec_query_stats AS qs
                CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
        ) AS queryStats
WHERE   queryStats.statement_text LIKE ’SELECT People.FullName, Orders.OrderDate%’;

This SELECT statement will return at least two rows; one for each query you have just executed. (It could be more depending on whether or not you have executed the statement in this statement more than two times, and if you messed up entering/copying the query like I did a few times). Hence, trying to use some method to make sure that every query sent that is essentially the same query is formatted the same is important: queries must use the same format, capitalization, and so forth.

Parameterization

The next performance query plan topic to discuss is parameterization. When a query is parameterized, only one version of the plan is needed to service many queries. Stored procedures are parameterized in all cases, but SQL Server does parameterize ad hoc SQL statements. By default, the optimizer doesn’t parameterize most queries, and caches most plans as straight text, unless the query is “simple.” For example, it can only reference a single table (search for “Forced Parameterization” in Books Online for the complete details). When the query meets the strict requirements, it changes each literal it finds in the query string into a parameter. The next time the query is executed with different literal values, the same plan can be used. For example, take this simpler form of the previous query:

SELECT People.FullName
FROM   Application.People
WHERE  People.FullName = N’Bala Dixit’;

The plan (from using showplan_text on in the manner we introduced in Chapter 10 “Basic Index Usage Patterns” as follows):

  |--Index Seek(OBJECT:([WideWorldImporters].[Application].[People].
                                                      [IX_Application_People_FullName]),
                SEEK:([WideWorldImporters].[Application].[People].[FullName]
                                                     =[@1]) ORDERED FORWARD)

The value of N’Bala Dexit’ has been changed to @1 (which is in bold in the plan), and the value is filled in from the literal at execute time. However, try executing this query that accesses two tables:

SELECT address.AddressLine1, address.AddressLine2,
        address.City, state.StateProvinceCode, address.PostalCode
FROM   Person.Address AS address
         JOIN Person.StateProvince AS state
                ON address.StateProvinceID = state.StateProvinceID
WHERE  address.AddressLine1 =’1, rue Pierre-Demoulin’;

The plan won’t recognize the literal and parameterize it:

|--Nested Loops(Inner Join,
       OUTER REFERENCES:([WideWorldImporters].[Sales].[Orders].[OrderID], [Expr1004]) WITH          
                                                                        UNORDERED PREFETCH)
       |--Nested Loops(Inner Join,
                  OUTER REFERENCES:([WideWorldImporters].[Application].[People].[PersonID]))
       |    |--Index Seek(OBJECT:([WideWorldImporters].[Application].[People].
                                                          [IX_Application_People_FullName]),                
                  SEEK:([WideWorldImporters].[Application].[People].[FullName]
                                                          =N’Bala Dixit’) ORDERED FORWARD)
       |    |--Index Seek(OBJECT:([WideWorldImporters].[Sales].[Orders].
                                                         [FK_Sales_Orders_ContactPersonID]),              
                  SEEK:([WideWorldImporters].[Sales].[Orders].[ContactPersonID]
                  =[WideWorldImporters].[Application].[People].[PersonID]) ORDERED FORWARD)
       |--Clustered Index Seek(OBJECT:
                  ([WideWorldImporters].[Sales].[Orders].[PK_Sales_Orders]),  
                        SEEK:([WideWorldImporters].[Sales].[Orders].[OrderID]=
                     [WideWorldImporters].[Sales].[Orders].[OrderID])
                                                                   LOOKUP ORDERED FORWARD)

Clearly it’s no longer a “simple” plan. Now the literal from the query is still in the plan as N’Bala Dixit’, rather than a parameter. Both plans are cached, but the first one can be used regardless of the literal value included in the WHERE clause. In the second, the plan won’t be reused unless the precise literal value of N’Bala Dixit’ is passed in.

If you want the optimizer to be more liberal in parameterizing queries, you can use the ALTER DATABASE command to force the optimizer to parameterize:

ALTER DATABASE WideWorldImporters
    SET PARAMETERIZATION FORCED;

Try the plan of the query with the join. It now has replaced the N’Bala Dixit’ with [@0], and the query processor can reuse this plan no matter what the value for the literal is. Note that there is a reason that more than simple plans are not parameterized by default. This can be a costly operation in comparison to normal, text-only plans, so not every system should use this setting. However, if your system is running the same, reasonably complex-looking queries over and over, this can be a wonderful setting to avoid the need to pay for the query optimization.

Not every query will be parameterized when forced parameterization is enabled. For example, change the equality to a LIKE condition:

SELECT address.AddressLine1, address.AddressLine2,
        address.City, state.StateProvinceCode, address.PostalCode
FROM   Person.Address AS address
         JOIN Person.StateProvince as state
                ON address.StateProvinceID = state.StateProvinceID
WHERE  address.AddressLine1 like ’1, rue Pierre-Demoulin’;

The plan will contain the literal, rather than the parameter, because it cannot parameterize the second and third arguments of a LIKE operator comparison (the arguments of the LIKE operator are arg1 LIKE arg2 [ESCAPE arg3]; we are only using the first two in our example). The following is a partial plan:

WHERE:([WideWorldImporters].[Application].[People].[FullName] like N’Bala Dixit’)
                                                                         ORDERED FORWARD)

If you change the query to end with WHERE N’Bala Dixit’ LIKE People.FullName, it would be parameterized, but that construct is rarely what is desired.

For your applications, another method is to use parameterized calls from the data access layer. Basically using ADO.NET, this would entail using T-SQL variables in your query strings, and then using a SqlCommand object and its Parameters collection. The plan that will be created from SQL parameterized on the client will in turn be parameterized in the plan that is saved.

The myth that performance is definitely worse with ad hoc calls is just not quite true (certainly after 7.0, which was a very long time ago indeed). Performance when using ad hoc calls to the SQL Server in your applications can certainly be less of a worry than you might have been led to believe. However, don’t stop reading here. While performance may not suffer tremendously, the inability to easily performance tune queries is one of the pitfalls, since once you have compiled that query into your application, changing the query is never as easy as it might seem during the development cycle.

So far, we have just executed queries directly, but there is a better method when building your interfaces that allows you to parameterize queries in a very safe manner. Using sp_executesql, you can fashion your SQL statement using variables to parameterize the query:

DECLARE @FullName nvarchar(60) = N’Bala Dixit’,
            @Query nvarchar(500),
        @Parameters nvarchar(500)
SET @Query= N’SELECT People.FullName, Orders.OrderDate
                        FROM   Sales.Orders
                                         JOIN Application.People
                                                on Orders.ContactPersonID = People.PersonID
                        WHERE  People.FullName LIKE @FullName’;
SET @Parameters = N’@FullName nvarchar(60)’;
EXECUTE sp_executesql @Query, @Parameters, @FullName = @FullName;

Using sp_executesql is generally considered the safest way to parameterize ad-hoc SQL queries because it does a good job of parameterizing the query and avoids issues like SQL injection attacks, which I will cover later in this chapter.

Finally, if you know you need to reuse the query multiple times, you can compile it and save the plan for reuse. This is generally useful if you are going to have to call the same object over and over. Instead of sp_executesql, use sp_prepare to prepare the plan; only this time you won’t use the actual value:

DECLARE @Query nvarchar(500),
        @Parameters nvarchar(500),
                @Handle int
SET @Query= N’SELECT People.FullName, Orders.OrderDate
              FROM   Sales.Orders
                         JOIN Application.People
                                ON Orders.ContactPersonID = People.PersonID
              WHERE  People.FullName LIKE @FullName’;
SET @Parameters = N’@FullName nvarchar(60)’;
EXECUTE sp_prepare @Handle output, @Parameters, @Query;
SELECT @handle;

That batch will return a value that corresponds to the prepared plan (in my case it was 1). This value is you handle to the plan that you can use with sp_execute on the same connection only. All you need to execute the query is the parameter values and the sp_execute statement, and you can use and reuse the plan as needed:

DECLARE  @FullName nvarchar(60) = N’Bala Dixit’;
EXECUTE sp_execute 1, @FullName;
SET @FullName = N’Bala%’;
EXECUTE sp_execute 1, @FullName;

You can unprepare the statement using sp_unprepare and the handle number. It is fairly rare that anyone will manually execute sp_prepare and sp_execute, but it is very frequently built into engines that are built to manage ad hoc access for you. It can be good for performance, but it is a pain for troubleshooting because you have to decode what the handle 1 actually represents. You can release the plan using sp_unprepare with a parameter of the handle.

What you end up with is pretty much the same as a procedure for performance, but it has to be done every time you run your app, and it is scoped to a connection, not shared on all connections. The better solution, strictly from a parameterizing complex statements standpoint, is a stored procedure. Generally, the only way this makes sense as a best practice is when you have very flexible queries being executed (like in the first section of this section on ad hoc SQL, or when you cannot use procedures, perhaps because of your tool choice) or using a third-party application. Why? Well, the fact is with stored procedures, the query code is stored on the server and is a layer of encapsulation that reduces coupling; but more on that in the “Stored Procedures” section.

Pitfalls

I’m glad you didn’t stop reading at the end of the previous section, because although I have covered the good points of using ad hoc SQL, there are the following significant pitfalls, as well:

  • Low cohesion, high coupling
  • Batches of statements
  • Security issues
  • SQL injection
  • Performance-tuning difficulties

Low Cohesion, High Coupling

The number-one pitfall of using ad hoc SQL as your interface relates to what you ideally learned back in Programming 101: strive for high cohesion, low coupling. Cohesion means that the different parts of the system work together to form a meaningful unit. This is a good thing, as you don’t want to include lots of irrelevant code in the system, or be all over the place. On the other hand, coupling refers to how connected the different parts of a system are to one another. It’s considered bad when a change in one part of a system breaks other parts of a system. (If you aren’t too familiar with these terms, minimally go to www.wikipedia.org and search for these terms. You should build all the code you create with these concepts in mind.)

When issuing T-SQL statements directly from the application, the structures in the database are tied directly to the client interface. This sounds perfectly normal and acceptable at the beginning of a project, but it means that any change in database structure might require a change in the user interface. This in turn means that making small changes to the system is just as costly as making large ones, because a full testing cycle is required.

Image Note  When I started this section, I told you that I wouldn’t make any distinction between toolsets used. This is still true. Whether you use a horribly, manually coded system or the best object-relational mapping system, the fact that the application tier knows and is built specifically with knowledge of the base structure of the database is an example of the application and data tiers being highly coupled. Though stored procedures are similarly inflexible, they are stored with the data, allowing the disparate systems to be decoupled: the code on the database tier can be structurally dependent on the objects in the same tier without completely sacrificing your loose coupling.

For example, consider that you’ve created an employee table, and you’re storing the employee’s spouse’s name, as shown in Figure 13-2.

9781484219720_13_Fig2.jpg

Figure 13-2. An employee table

Now, some new regulation requires that you have to include the ability to have more than one spouse, which necessitates a new table, as shown in Figure 13-3.

9781484219720_13_Fig3.jpg

Figure 13-3. Adding the ability to have more than one spouse

The user interface must immediately be morphed to deal with this case, or at the least, you need to add some code to abstract this new way of storing data. In a scenario such as this, where the condition is quite rare (certainly most everyone will have zero or one spouse), a likely solution would be simply to encapsulate the one spouse into the employee table via a view, change the name of the object the non-data tier accesses, and the existing UI would still work. Then you can add support for the atypical case with some sort of OtherSpouse functionality that would be used if the employee had more than one spouse. The original UI would continue to work, but a new form would be built for the case where COUNT(Spouse) > 1.

Batches of More Than One Statement

A major problem with ad hoc SQL access is that when you need to do multiple commands and treat them as a single operation, it becomes increasingly more difficult to build the mechanisms in the application code to execute multiple statements as a batch, particularly when you need to group statements together in a transaction. When you have only individual statements, it’s easy to manage ad hoc SQL for the most part. Some queries can get mighty complicated and difficult, but generally speaking, things pretty much work great when you have single statements per transaction. However, as complexity rises in the things you need to accomplish in a transaction, things get tougher. What about the case where you have 20 rows to insert, update, and/or delete at one time and all in one transaction?

Two different things usually occur in this case. The first way to deal with this situation is to start a transaction using functional code. For the most part the best practice is never to let transactions span batches, and you should minimize starting transactions using an ADO.NET object (or something like it). This isn’t a hard and fast rule, as it’s usually fine to do this with a middle-tier object that requires no user interaction. However, if something occurs during the execution of the object, you can still leave open connections to the server.

The second way to deal with this is to build a batching mechanism for batching SQL calls. Implementing the first method is self-explanatory, but the second is to build a code-wrapping mechanism, such as the following:

BEGIN TRY
BEGIN TRANSACTION;
    <-- statements go here
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    ROLLBACK TRANSACTION;
    THROW 50000, ’<describe what happened>’,16;
END CATCH;

For example, if you wanted to send a new invoice and line items, the application code would need to build a batch such as in the following code. Each of the SET @Action = … and INSERT statements would be put into the batch by the application, with the rest being boilerplate code that is repeatable.

SET NOCOUNT ON;
BEGIN TRY
BEGIN TRANSACTION;
       DECLARE @Action nvarchar(200);
       SET @Action = ’Invoice Insert’;
       INSERT Invoice (Columns) VALUES (Values);
       SET @Action = ’First InvoiceLineItem Insert’;
       INSERT InvoiceLineItem (Columns) VALUES (Values);
       SET @Action = ’Second InvoiceLineItem Insert’;
       INSERT InvoiceLineItem (Columns) VALUES (Values);
       SET @Action = ’Third InvoiceLineItem Insert’;
       INSERT InvoiceLineItem (Columns) VALUES (Values);
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    ROLLBACK TRANSACTION;
    DECLARE @Msg nvarchar(4000);
SET @Msg = @Action + ’ : Error was: ’ + CAST(ERROR_NUMBER() AS varchar(10)) + ’:’ +
                         ERROR_MESSAGE ();
    THROW 50000, @Msg,1;
END CATCH;

Executing multiple statements in a transaction is done on the server and the transaction either completes or not. There’s no chance that you’ll end up with a transaction swinging in the wind, ready to block the next user who needs to access the locked row (even in a table scan for unrelated items). A downside is that it does stop you from using any interim values from your statements to set values in following statements (without some really tricky coding/forethought, as you are generating this code from other code), but starting transactions outside of the batch in which you commit them is asking for blocking and locking due to longer batch times.

Starting the transaction outside of the server using application code is likely easier, but building this sort of batching interface is usually the preferred way to go. First, it’s better for concurrency, because only one batch needs to be executed instead of many little ones. Second, the execution of this batch won’t have to wait on communications back and forth from the server before sending the next command. It’s all there and happens in the single batch of statements. And if you are using in-memory tables, and all of your readers use SNAPSHOT isolation level, while you may never see any blocking, you could be using far more RAM than you expect due to versions being stuck in memory to satisfy your longer transactions. Bottom line: keep transactions as short as possible, in a single batch if possible.

Image Note  The problem I have run into in almost all cases was that building a batch of multiple SQL statements is a very unnatural thing for the object-oriented code to do. The way the code is generally set up is more one table to one object, where the Invoice and InvoiceLineItem objects have the responsibility of saving themselves. It is a lot easier to code, but too often issues come up in the execution of multiple statements and then the connection gets left open and other connections get blocked behind the locks that are left open because of the open transaction (if the coders even think about using transactions, in which things really do get messy for the data).

Security Issues

Security is one of the biggest downsides to using ad hoc access. For a user to use his or her own Windows login to access the server, you have to grant too many rights to the system, whereas with stored procedures, you can simply give access to the stored procedures. Using ad hoc T-SQL, you have to go with one of three possible security patterns, each with its own downsides:

  • Use one login for the application: This, of course, means that you have to code your own security system for application access rather than using what SQL Server gives you. This even includes some form of login and password for application access, as well as individual object access.
  • Use application roles: This is slightly better; while you have to implement security in your code since all application users will have the same database security, at the very least, you can let SQL server handle the data access via normal logins and passwords (probably using Windows authentication). Using application roles can be a good way to let users use Windows Authentication, but still not have direct access to the system. The password is passed in clear text, so you will need to use encrypted connections if there are any security concerns.
  • Give the user direct access to the tables, or possibly views: Unfortunately, this opens up your tables to users who discover the magical world of Management Studio, where they can open a table and immediately start editing it without any of those pesky UI data checks or business rules that only exist in your data access layer.

Usually, almost all applications follow the first of the three methods. Building your own login and application security mechanisms is just considered a part of the process, and just about as often, it is considered part of the user interface’s responsibilities. At the very least, by not giving all users direct access to the tables, the likelihood of them mucking around in the tables editing data all willy-nilly is greatly minimized. With procedures, you can give the users access to stored procedures, which are not natural for them to use and certainly would not allow them to accidentally delete data from a table.

The other issues security-wise are basically performance related. SQL Server must evaluate security for every object as it’s used, rather than once at the object level for stored procedures—that is, if the owner of the procedure owns all objects. This isn’t generally a big issue, but as your need for greater concurrency increases, everything becomes an issue!

Image Caution  If you use the single-application login method, make sure not to use an account with system administration or database owner privileges. Doing so opens up your application to programmers making mistakes, and if you miss something that allows SQL injection attacks, which I describe in the next section, you could be in a world of hurt.

SQL Injection

A big risk with ad hoc queries is being hacked by a SQL injection attack. Unless you (and/or your toolset) program your ad hoc SQL intelligently and/or (mostly and) use the parameterizing methods we discussed earlier in this section on ad hoc SQL, a user could inject something such as the following:

’ + char(13) + char(10) + ’;SHUTDOWN WITH NOWAIT;’ + ’--’

In this case, the command might just shut down the server if the security context that executes the statements has rights, but you can probably see far greater attack possibilities. I’ll discuss more about injection attacks and how to avoid them in the “Stored Procedures” section because they can have some of the same issues if using dynamic SQL. When using ad hoc SQL, you must be careful to avoid these types of issues for every call.

A SQL injection attack is not terribly hard to beat, but the fact is, for any user-enterable text where you don’t use some form of parameterization, you have to make sure to escape any single-quote characters that a user passes in. For general text entry, like a name, commonly if the user passes in a string like “O’Malley”, you must change this to ’O’’Malley’. For example, consider the following batch, where the resulting query will fail. (I have to escape the single quote in the literal to allow the query to execute, but that only translates to a single quote in the EXECUTE statement.)

DECLARE @value varchar(30) = ’O’’Malley’;
SELECT ’SELECT ’’’+ @value + ’’’’;
EXECUTE (’SELECT ’’’+ @value + ’’’’);

This will return

SELECT ’O’Malley’
Msg 105, Level 15, State 1, Line 1
Unclosed quotation mark after the character string ’’.

This is a very common problem that arises, and all too often the result is that the user learns not to enter a single quote in the query parameter value. But you can make sure this doesn’t occur by changing all single quotes in the value to double single quotes, like in the DECLARE @value statement. The best method is to use the QUOTENAME function, which is there to escape names of objects and such, but works for strings as well:

DECLARE @value varchar(30) = ’O’’Malley’, @query nvarchar(300);
SELECT @query = ’SELECT ’ + QUOTENAME(@value,’’’’);
SELECT @query;
EXECUTE (@query );

This returns

SELECT ’O’’Malley’
--------
O’Malley

Now, if someone tries to put a single quote, semicolon, and some other statement in the value, it doesn’t matter; it will always be treated as a literal string:

DECLARE @value varchar(30) = ’O’’; SELECT ’’badness’,
        @query nvarchar(300);
SELECT  @query = ’SELECT ’ + QUOTENAME(@value,’’’’);
SELECT  @query;
EXECUTE (@query );

The query is now, followed by the return:

SELECT ’O’’; SELECT ’’badness’
-------------------
O’; SELECT ’badness

However, what isn’t quite so obvious is that you have to do this for every single string, even if the string value could never legally (due to application and constraints) have a single quote in it. If you don’t double up on the quotes, the person could put in a single quote and then a string of SQL commands—this is where you get hit by the injection attack. And as I said before, the safest method of avoiding injection issues is to always parameterize your queries; though it can be very hard to use a variable query conditions using the parameterized method, losing some of the value of using ad hoc SQL.

DECLARE @value varchar(30) = ’O’’; SELECT ’’badness’,
        @query nvarchar(300),
        @parameters nvarchar(200) = N’@value varchar(30)’;
SELECT  @query = ’SELECT ’ + QUOTENAME(@value,’’’’);
SELECT  @query;
EXECUTE sp_executesql @Query, @Parameters, @value = @value;

If you employ ad hoc SQL in your applications, I strongly suggest you do more reading on the subject of SQL injection, and then go in and look at the places where SQL commands can be sent to the server to make sure you are covered. SQL injection is especially dangerous if the accounts being used by your application have too much power because you didn’t set up particularly granular security.

Note that if all of this talk of parameterization sounds complicated, it kind of is. Generally, there are two ways that parameterization happens well:

  • By building a framework that forces you to follow the correct pattern (as do most object-relational tools, though they often force—or at least lead—you into suboptimal patterns of execution, like dealing with every statement separately without transactions.)
  • By using stored procedures. Stored procedures parameterize in a manner that is impervious to SQL injection (except when you use dynamic SQL, which, as I will discuss later, is subject to the same issues as ad hoc access from any client).

Difficulty Tuning for Performance

Performance tuning is the second most important part of any computer project. It is second only to getting the right answer. While ad hoc SQL has a few benefits for performance that we have already discussed, in far more cases than not, it is far more difficult to tune when having to deal with ad hoc requests, for a few reasons:

  • Unknown queries: The application can be programmed to send any query it wants, in any way. Unless very extensive testing is done, slow or dangerous scenarios can slip through. With procedures, you have a tidy catalog of possible queries that might be executed, and tested. Of course, this concern can be mitigated by having a single module where SQL code can be created and a method to list all possible queries that the application layer can execute (however, that takes more discipline than most organizations have).
  • Often requires people from multiple teams: It may seem silly, but when something is running slower, it is always SQL Server’s fault (well, it is certainly blamed first in any case). With ad hoc calls, the best thing that the database administrator can do is use Extended Events to capture the queries that are executed, see if an index could help, and call for a programmer, leading to the issue in the next bullet.
  • Front-end recompile required for changing queries: If you want to change how a query works, the app likely needs to be rebuilt and redeployed. For stored procedures, you’d simply modify the query without the client knowing.

These reasons seem small during the development phase, but often they’re the real killers for tuning, especially when you get a third-party application and its developers have implemented a dumb query that you could easily optimize, but since the code is hard-coded in the application, modification of the query isn’t possible (not that this regularly happens; no, not at all; nudge, nudge, wink, wink).

SQL Server 2005 gave us plan guides (and there are some improvements in their usability in later version) that can be used to force a plan for queries, ad hoc calls, and procedures when you have troublesome queries that don’t optimize naturally. SQL Server 2016 has a feature called the Query Store that can help you see old plans, and help to change the plan that is used. But the fact is, going in and editing a query to tune is far easier than using plan guides, particularly in light of how ad hoc SQL can change in a less manageable manner than a stored procedure.

Stored Procedures

Stored procedures are compiled batches of SQL code that can be parameterized to allow for easy reuse. The basic structure of a typical interpreted stored procedure follows. (See SQL Server Books Online at msdn.microsoft.com/en-us/library/ms187926.aspx for a complete reference.)

CREATE PROCEDURE <procedureName>
[(
         @parameter1  <datatype> [ = <defaultvalue> [OUTPUT]]
         @parameter2  <datatype> [ = <defaultvalue> [OUTPUT]]
         ...
         @parameterN  <datatype> [ = <defaultvalue> [OUTPUT]]
)]
AS
<T-SQL statements>

You can put pretty much any statements that could have been sent as ad hoc calls to the server into a stored procedure and call them as a reusable unit. (Natively compiled procedures have far more limits, which I will cover in more detail in a later subsection of this section.) You can return an integer value from the procedure by using the RETURN statement, or return almost any datatype by declaring the parameter as an output parameter other than text or image, though you should be using (max) datatypes because they have been deprecated, and perhaps more importantly, they are pretty horrible. (You can also not return a table-valued parameter.) After the AS, you can execute any T-SQL commands you need to, using the parameters like variables. I will largely ignore natively compiled stored procedures for now, deferring discussion to a subsection of “Advantages” that covers how using stored procedures helps you make, or prepare to make, the most of the in-memory engine.

The following is an example of a basic procedure to retrieve rows from a table (continuing to use the WideWorldImporters tables for these examples):

CREATE PROCEDURE Sales.Orders$Select
(
        @FullNameLike nvarchar(100) = ’%’,
        @OrderDateRangeStart date = ’1900-01-01’,
        @OrderDateRangeEnd date = ’9999-12-31’
) AS
BEGIN
      SELECT People.FullName, Orders.OrderDate
      FROM   Sales.Orders
               JOIN Application.People
                  ON Orders.ContactPersonID = People.PersonID
      WHERE  People.FullName LIKE @FullNameLike
             --Inclusive since using Date type
        AND  OrderDate BETWEEN @OrderDateRangeStart
                                 AND @OrderDateRangeEnd;
END;

Now instead of having the client programs formulate a query by knowing the table structures, the client can simply issue a command, knowing a procedure name and the parameters. Clients can choose from four possible criteria to select the addresses they want. For example, they’d use the following if they want to find people in London:

EXECUTE Sales.Orders$Select @FullNameLike = ’Bala Dixit’;

Or they could use the other parameters:

EXECUTE Sales.Orders$Select @FullNameLike = ’Bala Dixit’,
                    @OrderDateRangeStart = ’2016-01-01’,
                    @OrderDateRangeEnd = ’2016-12-31’;

The client doesn’t know whether the database or the code is well built or even if it’s horribly designed (and it is far easier to get away with bad code in a stored procedure than generated by another tier). Originally our name value might have been a part of the Orders table but changed to its own table when an actual data architect was hired. Often, the tasks the client needs to do won’t change based on the database structures, so why should the client need to know the database structures?

For much greater detail about how to write stored procedures and good T-SQL, consider the books Pro T-SQL 2008 Programmer’s Guide by Michael Coles (Apress, 2008) and Inside Microsoft SQL Server 2008: T-SQL Programming (Pro-Developer) by Itzik Ben-Gan et al. (Microsoft Press, 2009); or search for “stored procedures” in Books Online. In this section, I’ll look at some of the advantages of using stored procedures as our primary interface between client and data, followed up by the pitfalls of such methods.

Advantages

The advantages of using stored procedures are considerable, and they are actually even more if you want to take full advantage of the in-memory OLTP engine. I’ll discuss the following topics:

  • Encapsulation: Limits client knowledge of the database structure by providing a simple interface for known operations.
  • Dynamic procedures: Gives the best of both worlds, allowing for ad hoc–style code without giving ad hoc access to the database.
  • Security: Provides a well-formed interface for known operations that allows you to apply security only to this interface and disallow other access to the database.
  • Performance: Allows for efficient parameterization of any query, as well as tweaks to the performance of any query, without changes to the client interface.
  • Making full use of the in-memory OLTP engine: Requires use of natively compiled stored procedures. If your architecture is currently based on stored procedures, making this change will be far easier. I will describe a few of the limitations and differences between interpreted and natively compiled stored procedures.

Encapsulation

To me, encapsulation is the primary reason for using stored procedures, and it’s the leading reason behind all the other topics that I’ll discuss. When talking about encapsulation, the idea is to hide the working details from processes that have no need to know about the details. Encapsulation is a large part of the desired “low coupling” of our code that I discussed in the pitfalls of ad hoc access. Some software is going to have to be coupled to the data structures, of course, but locating that code with the structures makes it easier to manage. Plus, the people who generally manage the SQL Server code on the server are not the same people who manage the compiled code.

For example, when we coded the Sales.Orders$Select procedure, it was unimportant to the client how the procedure was coded. We could have built it based on a view and selected from it, or the procedure could call 16 different stored procedures to improve performance for different parameter combinations. We could even have used the dreaded cursor version that you find in some production systems:

--pseudocode:
CREATE PROCEDURE Sales.Orders$Select
...
Create temp table;
Declare cursor for (select all rows from the query table);
Fetch first row;
While not end of cursor (@@fetch_status)
 Begin
         Check columns for a match to parameters;
         If match, put into temp table;
         Fetch next row;
 End;
SELECT * FROM temp table;

This is horrible code to be sure. I didn’t give real code so it wouldn’t be confused for a positive example and imitated. However, it certainly could be built to return correct data and possibly could even be fast enough for smaller data sets (“Works on my machine!”). Even better, if we do things right, when the client executes the following code, they get the same result, regardless of the internal code:

EXECUTE Sales.Orders$Select @FullNameLike = ’Bala Dixit’,
                            @OrderDateRangeStart = ’2016-01-01’,
                            @OrderDateRangeEnd = ’2016-12-31’;

What makes procedures great is that you can rewrite the guts of the procedure using the server’s native language without any concern for breaking any client code. This means that anything can change, including table structures, column names, and coding method (cursor, join, and so on), and no client code need change as long as the inputs and outputs stay the same.

The only caveat to this is that you can get some metadata about procedures only when they are written using compiled SQL without conditionals (if condition select … else select…). For example, using sp_describe_first_result_set you can get metadata about the procedure we wrote earlier:

EXECUTE sp_describe_first_result_set
           N’Sales.Orders$Select’;

Or even with parameters:

EXECUTE sp_describe_first_result_set
           N’Sales.Orders$Select @FullNameLike = ’’Bala Dixit’’’;

This returns metadata about what will be returned (this is just a small amount of what is returned in the columns):

column_ordinal name                  system_type_name
 -------------- --------------------- ------------------------------
 1              FullName              nvarchar(50)                  
 2              OrderDate             date

For poorly formed procedures, even if it returns the exact same result set, you will not be as able to get the metadata from the procedure. For example, consider the following procedure:

CREATE PROCEDURE dbo.Test (@Value int = 1)
AS
IF @value = 1
    SELECT ’FRED’ as Name;
ELSE
    SELECT 200 as Name;        

If you run the procedure with 1 for the parameter, it will return FRED, and for any other value, it will return 200. Both are named “name”, but they are not the same type. So checking the result set:

EXECUTE sp_describe_first_result_set N’dbo.Test’

returns this (actually quite) excellent error message, letting us know that the output of this procedure can’t be determined because one call may have a varchar type, and the next an integer:

Msg 11512, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1
The metadata could not be determined because the statement ’SELECT ’FRED’ AS Name;’ in procedure ’Test’ is not compatible with the statement ’SELECT 200 AS Name;’ in procedure ’Test’.

This concept of having easy access to the code may seem like an insignificant consideration, especially if you generally only work with limited sized sets of data. The problem is, as data set sizes fluctuate, the types of queries that will work often vary greatly. When you start dealing with increasing orders of magnitude in the number of rows in your tables, queries that seemed just fine somewhere at ten thousand rows start to fail to produce the kinds of performance that you need, so you have to tweak the queries to get results in an amount of time that users won’t complain to your boss about. I will cover more details about performance tuning in a later section.

Image Note  Some of the benefits of building your objects in the way that I describe can also be achieved by building a solid middle-tier architecture with a data layer that is flexible enough to deal with change. However, I will always argue that it is easier to build your data access layer in the T-SQL code that is built specifically for data access. Unfortunately, it doesn’t solve the code ownership issues (functional versus relational programmers), nor does it solve the issue with performance-tuning the code.

Dynamic Procedures

You can dynamically create and execute code in a stored procedure, just like you can from the front end. Often, this is necessary when it’s just too hard to get a good answer using the rigid requirements of precompiled stored procedures. For example, say you need a procedure that requires a lot of optional parameters. It can be easier to include only parameters where the user passes in a value and let the compilation be done at execution time, especially if the procedure isn’t used all that often. The same parameter sets will get their own plan saved in the plan cache anyhow, just like for typical ad hoc SQL.

Clearly, some of the problems of straight ad hoc SQL pertain here as well, most notably SQL injection. You must always make sure that no input users can enter can allow them to return their own results, allowing them to poke around your system without anyone knowing. As mentioned before, a common way to avoid this sort of thing is always to check the parameter values and immediately double up the single quotes so that the caller can’t inject malicious code where it shouldn’t be.

Make sure that any parameters that don’t need quotes (such as numbers) are placed into the correct datatype. If you use a string value for a number, you can insert things such as ’novalue’ and check for it in your code, but another user could put in the injection attack value and be in like Flynn. For example, take the sample procedure from earlier, and let’s turn it into the most obvious version of a dynamic SQL statement in a stored procedure:

ALTER PROCEDURE Sales.Orders$Select
(
        @FullNameLike nvarchar(100) = ’%’,
        @OrderDateRangeStart date = ’1900-01-01’,
        @OrderDateRangeEnd date = ’9999-12-31’
) AS
BEGIN
        DECLARE @query varchar(max) =
        CONCAT(’
          SELECT People.FullName, Orders.OrderDate
          FROM   Sales.Orders
                                 JOIN Application.People
                                        ON Orders.ContactPersonID = People.PersonID
          WHERE  OrderDate BETWEEN ’’’, @OrderDateRangeStart, ’’’
                               AND ’’’, @OrderDateRangeEnd,’’’
                                           AND People.FullName LIKE ’’’, @FullNameLike, ’’’’ );
         SELECT @query; --for testing
         EXECUTE (@query);
END;

There are two problems with this version of the procedure. The first is that you don’t get the full benefit, because in the final query you can end up with useless parameters used as search arguments that make using indexes more difficult, which is one of the main reasons I use dynamic procedures. I’ll fix that in the next version of the procedure, but the most important problem is the injection attack. For example, let’s assume that the user who’s running the application has dbo powers or rights to sysusers. The user executes the following statement:

EXECUTE Sales.Orders$Select @FullNameLike = ’~;’’select name from sysusers--’,
                                    @OrderDateRangeStart = ’2016-01-01’;

This returns three result sets: the two (including the test SELECT) from before plus a list of all of the users in the WideWorldImporters database. No rows will be returned to the proper result sets, because no address lines happen to be equal to ’~’, but the list of users is not a good thing because, with some work, a decent hacker could probably figure out how to use a UNION and get back the users as part of the normal result set.

The easy way to correct this is to use the QUOTENAME() function to make sure that all values that need to be surrounded by single quotes are formatted in such a way that no matter what a user sends to the parameter, it cannot cause a problem. Note that if you programmatically chose columns, you ought to use the QUOTENAME() function to insert the bracket around the name. SELECT QUOTENAME(’FRED’) would return [FRED].

This would look like:

ALTER PROCEDURE Sales.Orders$Select
(
        @FullNameLike nvarchar(100) = ’%’,
        @OrderDateRangeStart date = ’1900-01-01’,
        @OrderDateRangeEnd date = ’9999-12-31’
) AS
BEGIN
        DECLARE @query varchar(max) =
        CONCAT(’
      SELECT People.FullName, Orders.OrderDate
          FROM   Sales.Orders
                   JOIN Application.People
                      ON Orders.ContactPersonID = People.PersonID
          WHERE  People.FullName LIKE ’, QUOTENAME(@FullNameLike,’’’’), ’
                AND  OrderDate BETWEEN ’, QUOTENAME(@OrderDateRangeStart,’’’’), ’
                                       AND ’, QUOTENAME(@OrderDateRangeEnd,’’’’));
         SELECT @query; --for testing
         EXECUTE (@query);
END;

In the next code block, I will change the procedure to safely deal with invalid quote characters, and instead of just blindly using the parameters, if the parameter value is the same as the default, I will leave off the values from the WHERE clause. Note that using sp_executesql and parameterizing will not be possible if you want to do a variable WHERE or JOIN clause, so you have to take care to avoid SQL injection in the query itself.

ALTER PROCEDURE Sales.Orders$Select
(
        @FullNameLike nvarchar(100) = ’%’,
        @OrderDateRangeStart date = ’1900-01-01’,
        @OrderDateRangeEnd date = ’9999-12-31’
) AS
BEGIN
        DECLARE @query varchar(max) =
        CONCAT(’
          SELECT People.FullName, Orders.OrderDate
          FROM   Sales.Orders
                   JOIN Application.People
                      ON Orders.ContactPersonID = People.PersonID
          WHERE  1=1
          ’,
           --ignore @FullNameLike parameter when it is set to all
           CASE WHEN @FullNameLike <> ’%’ THEN
                 CONCAT(’ AND  People.FullName LIKE ’, QUOTENAME(@FullNameLike,’’’’))
           ELSE ’’ END,
           --ignore @date parameters when it is set to all
           CASE WHEN @OrderDateRangeStart <> ’1900-01-01’ OR
                      @OrderDateRangeEnd <> ’9999-12-31’
                        THEN
           CONCAT(’AND  OrderDate BETWEEN ’, QUOTENAME(@OrderDateRangeStart,’’’’), ’
                                       AND ’, QUOTENAME(@OrderDateRangeEnd,’’’’))
                        ELSE ’’ END);
          SELECT @query; --for testing
          EXECUTE (@query);
END;

Now I might get a much better plan, especially if there are several useful indexes on the table. That’s because SQL Server can make the determination of what indexes to use at runtime based on the parameters needed, rather than using a single stored plan for every possible combination of parameters. I also don’t have to worry about injection attacks, because it’s impossible to put something into any parameter that will be anything other than a search argument, and that will execute any code other than what I expect. Basically, this version of a stored procedure is the answer to the flexibility of using ad hoc SQL, though it is a bit messier to write. However, it is located right on the server where it can be tweaked as necessary.

Try executing the evil version of the query, and look at the WHERE clause it fashions:

WHERE   1=1
      AND People.FullName like ’~’’select name from sysusers--’

The query that is formed, when executed, will now just return two result sets (one for the query and one for the results), and no rows for the executed query. This is because you are looking for rows where People.FullName is like ~’select name from sysusers--. While not being exactly impossible, this is certainly very, very unlikely.

I should also note that in versions of SQL Server before 2005, using dynamic SQL procedures would break the security chain, and you’d have to grant a lot of extra rights to objects just used in a stored procedure. This little fact was enough to make using dynamic SQL not a best practice for SQL Server 2000 and earlier versions. However, in SQL Server 2005 you no longer had to grant these extra rights, as I’ll explain in the next section. (Hint: you can EXECUTE AS someone else.)

Security

My second most favorite reason for using stored-procedure access is security. You can grant access to just the stored procedure, instead of giving users the rights to all the different resources used by the stored procedure. Granting rights to all the objects in the database gives them the ability to open Management Studio and do the same things (and more) that the application allows them to do. This is rarely the desired effect, as an untrained user let loose on base tables can wreak havoc on the data. (“Oooh, I should change that. Oooh, I should delete that row. Hey, weren’t there more rows in this table before?”) I should note that if the database is properly designed, users can’t violate core structural business rules, but they can circumvent business rules in the middle tier and common procedures and can execute poorly formed queries that chew up important resources.

Image Note  In general, it is best to keep most of your users away from the power tools like Management Studio and keep them in a sandbox where even if they have advanced powers (like because they are CEO) they cannot accidentally see too much (and particularly modify and/or delete) data that they shouldn’t. Provide tools that hold users’ hands and keep them from shooting off their big toe (or really any toe for that matter).

With stored procedures, you have a far clearer surface area of a set of stored procedures on which to manage security at pretty much any granularity desired, rather than tables, columns, groups of rows (row-level security), and actions (SELECT, UPDATE, INSERT, DELETE), so you can give rights to just a single operation, in a single way. For example, the question of whether users should be able to delete a contact is wide open, but should they be able to delete their own contacts? Sure, you can do that with row-level security, but it can get kind of complex to set up block filters. If we create a stored procedure like Contact$DeletePersonal (meaning a contact that the user owned). Making this choice easy would be based on how well you name your procedures. I use a naming convention of <tablename | subject area>$<action> as you probably have witnessed throughout the book. How you name objects is completely a personal choice, as long as you follow a standard that is meaningful to you and others.

As discussed back in the “Ad Hoc SQL” section, a lot of architects simply avoid this issue altogether (either by choice or as a result of political pressure) by letting objects connect to the database as a single user, leaving it up to the application to handle security. That can be an adequate method of implementing security, and the security implications of this are the same for stored procedures or ad hoc usage. Using stored procedures still clarifies what you can or cannot apply security to.

In SQL Server 2005, the EXECUTE AS clause was added on the procedure declaration. In versions before SQL Server 2005, if a different user owned any object in the procedure (or function, view, or trigger), the caller of the procedure had to have explicit rights to the resource. This was particularly annoying when having to do some small dynamic SQL operation in a procedure, as discussed in the previous section.

The EXECUTE AS clause gives the programmer of the procedure the ability to build procedures where the procedure caller has the same rights in the procedure code as the owner of the procedure—or if permissions have been granted, the same rights as any user or login in the system.

For example, consider that you need to do a dynamic SQL call to a table. (In reality, it ought to be more complex, like needing to use a sensitive resource.) Note that we covered this in more detail in Chapter 9, but I want to show a quick example here as it pertains to the value of stored procedures. First, create a test user:

CREATE USER Fred WITHOUT LOGIN;

Next, create a simple stored procedure:

CREATE PROCEDURE dbo.TestChaining
AS
EXECUTE (’SELECT CustomerID, StoreID, AccountNumber
          FROM   Sales.Customer’);
GO
GRANT EXECUTE ON testChaining TO fred;

Now execute the procedure (changing your security context to be this user):

EXECUTE AS USER = ’Fred’;
EXECUTE dbo.testChaining;
REVERT;

You’re greeted with the following error:

Msg 229, Level 14, State 5, Line 1
The SELECT permission was denied on the object ’People’, database ’WideWorldImporters’, schema ’Application’.

You could grant rights to the user directly to the object, but this gives users more usage than just from this procedure, which is precisely what we are trying to avoid by using stored procedures! Now change the procedure to EXECUTE AS SELF:

ALTER PROCEDURE dbo.testChaining
WITH EXECUTE AS SELF
AS
EXECUTE (’SELECT CustomerID, StoreId, AccountNumber
          FROM Sales.Customer’);

Image Note  If you have restored this database from the Web, you may get an error message: Msg 15517, Cannot execute as the database principal because the principal "dbo" does not exist, this type of principal cannot be impersonated, or you do not have permission. This will occur if your database owner’s sid is not correct for the instance you are working on. Use: ALTER AUTHORIZATION ON DATABASE::WideWorldImporters to SA to set the owner to the SA account. Determine database owner using query: SELECT SUSER_SNAME(owner_sid) FROM sys.databases WHERE name = ’WideWorldImporters’;

Now, go back to the context of user Fred and try again. Just like when Fred had access directly, you get back data. You use SELF to set the context the same as the principal creating the procedure. OWNER is usually the same as SELF, and you can only specify a single user in the database (it can’t be a group). Warning: the EXECUTE AS clause can be abused if you are not extremely careful. Consider the following query, which is obviously a gross exaggeration of what you might hope to see but not beyond possibility:

CREATE PROCEDURE dbo.YouCanDoAnything_ButDontDoThis
(
    @query nvarchar(4000)
)
WITH EXECUTE AS SELF
AS
EXECUTE (@query);

This procedure gives the person who has access to execute it full access to the database. Bear in mind that any query can be executed (DROP TABLE? Sure, why not?), easily allowing improper code to be executed on the database. Now, consider a little math problem; add the following items:

  • EXECUTE AS SELF
  • Client executing code as the database owner (a very bad, yet very typical practice)
  • The code for dbo.YouCanDoAnything_ButDontDoThis
  • An injection-susceptible procedure, with a parameter that can hold approximately 120 characters (the length of the dbo.YouCanDoAnything_ButDontDoThis procedure plus punctuation to create it)

What do you get? If you guessed no danger at all, please e-mail me your Social Security number, address, and a major credit card number. If you realize that the only one that you really have control over is the fourth one and that hackers, once the dbo.YouCanDoAnything_ButDontDoThis procedure was created, could execute any code they wanted as the owner of the database, you get the gold star. So be careful to block code open to injection.

Image Tip  I am not suggesting that you should avoid the EXECUTE AS setting completely, just that its use must be scrutinized a bit more than the average stored procedure along the lines of when a #temp table is used. Why was EXECUTE AS used? Is the use proper? You must be careful to understand that in the wrong hands this command can be harmful to security.

Performance

There are a couple of reasons why stored procedures are great for performance:

  • Parameterization of complex plans is controlled by you at design time rather than controlled by the optimizer at compile time.
  • You can performance-tune your queries without making invasive program changes.

Parameterization of Complex Plans

Stored procedures, unlike ad hoc SQL, can always have parameterized plans for maximum reuse of the plans. This lets you avoid the cost of recompilation, as well as the advanced costs of looking for parameters in the code. Any literals are always literal, and any variable is always a parameter. This can lead to some performance issues as well, as occasionally the plan for a stored procedure that gets picked by the optimizer might not be as good of a plan as might be picked for an ad hoc procedure.

The interesting thing here is that, although you can save the plan of a single query with ad hoc calls, with procedures you can save the plan for a large number of statements. With all the join types, possible tables, indexes, view text expansions, and so on, optimizing a query is a nontrivial task that might take quite a few milliseconds. As user counts go up, the amount of time begins to add up. With stored procedures, this has to be done only once.

Stored procedure parameterization uses the variables that you pass the first time the procedure is called to create the plan. This process is known as parameter sniffing. This is great, though sometimes you get in a situation where you have some values that will work nicely for a query but others that work pitifully. Two different values that are being searched for can end up creating two different plans. Often, this is where you might pass in a value that tells the query that there are no values, and SQL Server uses that value to build the plan. When you pass in a real value, it takes far too long to execute. Using WITH RECOMPILE at the object level or the RECOMPILE statement-level hint can avoid the problems of parameter sniffing, but then you have to wait for the plan to be created for each execute, which can be costly. It’s possible to branch the code out to allow for both cases, but this can get costly if you have a couple of different scenarios to deal with. In still other cases, you can use an OPTIMIZE FOR hint to optimize for the common case when there are parameter values that produce less than adequate results, although presumably results that you can live with, not a plan that takes an hour or more to execute what normally takes milliseconds.

In some cases, the plan gets stale because of changes in the data, or even changes in the structure of the table. In stored procedures, SQL Server can recompile only the single statement in the plan that needs recompiled. While managing compilation and recompilation can seem a bit complicated, it really isn’t; there are a few caveats, but they are generally very few and far between. You have several ways to manage the parameterization and you have direct access to the code to change it. For the person who has to deal with parameter issues, or really any sort of tuning issues, our next topic is about tuning procedures without changing the procedure’s public interface. You can use the tricks mentioned to fix the performance issue without the client’s knowledge.

Fine-Tuning Without Program Changes

Even if you didn’t have the performance capabilities of parameterization for stored procedures (say every query in your procedure was forced to do dynamic SQL), the ability to fine-tune the queries in the stored procedure without making any changes to the client code is of incredible value. Of course, this is the value of encapsulation, but again, fine-tuning is such an important thing.

Often, a third-party system is purchased that doesn’t use stored procedures. If you’re a support person for this type of application, you know that there’s little you can do other than to add an index here and there.

“But,” you’re probably thinking, “shouldn’t the third party have planned for all possible cases?” Sure they should have, because while the performance characteristics of a system with 10 rows might turn out to be identical to one with 10,000, there is the outlier, like the organization that pumped 10 million new rows per day into that system that was only expected to do 10,000 per day. The fact is, SQL Server is built to operate on a large variety of hardware in a large variety of conditions. A system running on a one-processor laptop with its slow disk subsystem behaves exactly like a RAID 10 system with 20 high-speed, 32GB solid-state drives, right? (Another gold star if you just said something witty about how dumb that sounded.)

The answer is obviously no. In general, the performance characteristics of database systems vary wildly based on usage characteristics, hardware, and data sizing issues. By using stored procedures, it’s possible to tweak how queries are written, as the needs change from small dataset to massive dataset. For example, I’ve seen many not so perfect queries that ran great with 10,000 rows, but when the needs grew to millions of rows, the queries ran for hours. Rewriting the queries using proper query techniques, or sometimes breaking the query up into multiple queries using temporary tables as intermediate steps, gave performance that was several orders of magnitude better. And I have had the converse be true, where I have removed temporary tables and consolidated queries into a single statement to get better performance. The user of the procedures did not even know.

This ability to fine-tune without program changes is very important. Particularly as a corporate developer, when the system is running slow and you identify the query or procedure causing the issue, fixing it can be either a one-hour task or a one-week task. If the problem is a procedure, you can modify, test, and distribute the one piece of code. Even following all of the rules of proper code management, your modify/test/distribute cycle can be a very fast operation. However, if application code has to be modified, you have to coordinate multiple groups (DBAs and programmers, at least), discuss the problem, and then rewrite and test the code (for many more permutations of parameters and settings than for the one procedure).

For smaller organizations, it can be overly expensive to get a really good test area, so if the code doesn’t work quite right in production, you can tune it easily then. Tuning a procedure is easy, even modification procedures; you can just execute the code in a transaction and roll it back. Keep changing the code until satisfied, compile the code in the production database, and move on to the next problem. (This is not best practice, but it is something I have to do from time to time.)

Beyond performance, you can also fine-tune things about how the system works. For example, say you are receiving a lot of error messages, but are having a lot of trouble finding out where the error occurs. In the trigger templates, we include error logging that will capture the error that has occurred in a table. This can also be done in stored procedures. Matt Martin (www.sqletl.com) has an entire framework built around this concept of having procedures that log the error messages to a table and enabling those messages to be e-mailed when an error occurs.

The primary thing to realize is that the ability to fine-tune the data layer without affecting any changes to the other layers of code is a fantastic benefit. Of course, don’t just do it without testing and proper source controls (or source control!) that you would have with any other changes. It is just quicker to regression test a single module, prove that you haven’t changed the public interface (the inputs and outputs look exactly the same), and roll out to production.

Ability to Use the In-Memory Engine to Its Fullest

Of all of the arguments I have made to this point of the chapter, this one is the most compelling in my opinion, and really goes hand in hand with the previous section. As discussed in the first major section of this chapter, there are two engines in SQL Server for data storage, and there are two engines for executing SQL statements. The first is an interpreted engine that compiles your code into high-level commands that are executed and can be tweaked at runtime (hence the reason we have both an estimated plan and an actual plan we can look at when tuning queries). If you access in-memory tables using interpreted T-SQL, it accesses in interop mode. You still get the benefits of things like no lock, no blocks multivalued concurrency, but statements are still interpreted.

However, if you want to get the full potential out of the in-memory engine, you need to use natively compiled procedures. Interpreted code is interesting because it is late bound. The procedure (or function/trigger) references objects by object_id in the plan, but if that object_id changes, it can recompile by name. If you create an object and reference a name that doesn’t exist, then it doesn’t fail until you execute it.

Natively compiled objects are compiled into machine code, just like if you had written C# and compiled it (as are the tables you have created, by the way), and they reference all resources they use with strong connections. As such, natively compiled objects all must be created as WITH SCHEMABINDING, which tells you that the object cannot change when this object is referencing it. (This can make altering objects much more difficult, so it makes source control and scripts far more important.)

Every stored procedure you create natively has one other feature that is interesting. They all execute as an atomic block, which means that everything in the procedure completes, or it all fails.

The only real downside of natively compiled objects is the limitations. First, they can only access natively compiled objects and in-memory tables. Second, as of SQL Server 2016, the limitations on available syntax are quite tremendous. In the trigger examples in the downloads for Chapter 6, you will see that they are written in what would be horrible form for typical T-SQL, but the limitations on looping code are somewhat less due to the way natively compiled code of all types behaves.

For examples, I will create two quick example procedures that show you the basic format of natively compiled objects, and demonstrate the atomic block feature. This is not a programming book, so I will not dive too deep, but you should at least understand the basics so that you appreciate the discussion of the pros and cons of stored procedures, and why natively compiled objects can be tremendous for your performance—as long as you can live with the concurrency issues we discussed in Chapter 11, and the code limitations you can read about here: msdn.microsoft.com/en-us/library/dn452279.aspx. (It is quite telling that the article is, as of 2016, titled “Supported Features for Natively Compiled T-SQL Modules” rather than referring to some limitations.)

The procedures I will build will use the Warehouse.VehicleTemperatures table in the WideWorldImporters database we have been using in previous chapters. The first, very simple procedure selects rows from the VehicleTemperatures table where a temperature is within a range, based on a couple of parameters:

USE WideWorldImporters;
GO
CREATE PROCEDURE Warehouse.VehicleTemperatures$Select  
(
        @TemperatureLowRange decimal(10,2) = -99999999.99,
        @TemperatureHighRange decimal(10,2) = 99999999.99
)
WITH SCHEMABINDING, NATIVE_COMPILATION  AS  
  BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N’us_english’)  
        SELECT VehicleTemperatureID, VehicleRegistration,
               RecordedWhen, Temperature
        FROM   Warehouse.VehicleTemperatures
        WHERE  Temperature BETWEEN @TemperatureLowRange AND @TemperatureHighRange
        ORDER BY RecordedWhen DESC; --Most Recent First
  END;  

There are a few interesting things to point out. I discussed SCHEMABINDING already, and NATIVE_COMPILATION is obviously telling the engine to use the native compilation engine. BEGIN ATOMIC is a new construct, and it basically says this entire object operates as an atomic unit. If one thing fails, the whole thing does. You specify an isolation level for the atomic block, SNAPSHOT, REPEATABLE READ, or SERIALIZABLE, and they work just like described in Chapter 11. Finally, you have to give it a language to compile under, because it doesn’t default to the local language. This goes to the compiler, just as if you were building your own DLL in a less awesome language.

I compile this and then can execute the procedures just like any other stored procedure:

EXECUTE Warehouse.VehicleTemperatures$Select ;
EXECUTE Warehouse.VehicleTemperatures$Select @TemperatureLowRange = 4;
EXECUTE Warehouse.VehicleTemperatures$Select @TemperatureLowRange = 4.1,
                                             @TemperatureHighRange = 4.1;

Each of these calls will return a result set shaped like the following:

VehicleTemperatureID VehicleRegistration  RecordedWhen                Temperature
-------------------- -------------------- --------------------------- ---------------
65270                WWI-321-A            2016-05-30 10:04:45.0000000 4.10
65227                WWI-321-A            2016-05-30 09:18:14.0000000 4.10
64942                WWI-321-A            2016-05-29 12:37:20.0000000 4.10
64706                WWI-321-A            2016-05-29 07:44:12.0000000 4.10
...                  ...                  ...                         ...

Next, I will build a stored procedure that will change a row of data, with a bit of trick code to cause an error if a 0 variable is passed in, causing an error:

CREATE PROCEDURE Warehouse.VehicleTemperatures$FixTemperature  
(
        @VehicleTemperatureID int,
        @Temperature decimal(10,2)
)
WITH SCHEMABINDING, NATIVE_COMPILATION AS  
--Simulating a procedure you might write to fix a temperature that was found to be
--outside of reasonability
  BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N’us_english’)  
    BEGIN TRY
                --Update the temperature
                UPDATE Warehouse.VehicleTemperatures
                SET       Temperature = @Temperature
                WHERE  VehicleTemperatureID = @VehicleTemperatureID;
                --give the ability to crash the procedure for demo
                --Note, actually doing 1/0 is stopped by the compiler
                DECLARE @CauseFailure int
                SET @CauseFailure =  1/@Temperature;
                --return data if not a fail
                SELECT ’Success’ AS Status, VehicleTemperatureID,
                           Temperature
                FROM   Warehouse.VehicleTemperatures
                WHERE  VehicleTemperatureID = @VehicleTemperatureID;
        END TRY
        BEGIN CATCH
                --return data for the fail
                SELECT ’Failure’ AS Status, VehicleTemperatureID,
                           Temperature
                FROM   Warehouse.VehicleTemperatures
                WHERE  VehicleTemperatureID = @VehicleTemperatureID;
                THROW; --This will cause the batch to stop, and will cause this
                       --transaction to not be committed. Cannot use ROLLBACK
                           --does not necessarily end the transaction, even if it ends
                           --the batch.
        END CATCH;
  END;  

I included the THROW in the CATCH block to show the state of the data, and to show that the TRY...CATCH construct works, and you can try things like logging errors, or whatever you want to try. To show it working, I can use the following batches:

--Show original value of temperature for a given row
SELECT Temperature
FROM   Warehouse.VehicleTemperatures
WHERE  VehicleTemperatureID = 65994;

This shows the original value:

Temperature
---------------------------------------
4.18

Now, I execute the procedure with a value that will work:

EXECUTE Warehouse.VehicleTemperatures$FixTemperature
                                    @VehicleTemperatureId = 65994,
                                    @Temperature = 4.2;

This will return

Status  VehicleTemperatureID Temperature
------- -------------------- ------------------
Success 65994                4.20

So that worked just fine. Now, I cause an error by sending a 0 value:

EXECUTE Warehouse.VehicleTemperatures$FixTemperature                                                                    
                               @VehicleTemperatureId = 65994,                                                 
                               @Temperature = 0;

This causes the following error, though you can see the data was updated successfully:

Status  VehicleTemperatureID Temperature
------- -------------------- ---------------
Failure 65994                0.00
Msg 8134, Level 16, State 0, Procedure VehicleTemperatures$FixTemperature, Line 18
Divide by zero error encountered.

Next, I use the ThrowErrorFlag to show what happens when I ignore the error:

EXECUTE Warehouse.VehicleTemperatures$FixTemperature                                                                    
                               @VehicleTemperatureId = 65994,                                                 
                               @Temperature = 0,
                               @ThrowErrorFlag = 0;

It still says failure, but provides no error message:

Status  VehicleTemperatureID Temperature
------- -------------------- ---------------
Failure 65994                0.00

I check the actual value in the table, and it is 0.00. So the error from the THROW statement killed the batch and the transaction. I execute the code to set the data to 4.20, and make sure it is.

Next, let’s look at what happens inside a transaction. I start a transaction, let the error occur, and see if I am still in a transaction:

SELECT @@TRANCOUNT AS TranStart;
BEGIN TRANSACTION
EXECUTE Warehouse.VehicleTemperatures$FixTemperature                                                                 
                             @VehicleTemperatureId = 65994,                                     
                             @Temperature = 0,
                             @ThrowErrorFlag = 1;
GO
SELECT @@TRANCOUNT AS TranEnd;
GO

The data is updated, we don’t rollback in the procedure, we just throw an error. The transaction that is started externally is not rolled back:

TranStart
-----------
0
Status  VehicleTemperatureID Temperature
------- -------------------- ---------------------------------------
Failure 65994                0.00
Msg 8134, Level 16, State 0, Procedure VehicleTemperatures$FixTemperature, Line 18
Divide by zero error encountered.
TranEnd
-----------
1

Now, before I have rollback, I check the value:

SELECT Temperature
FROM   Warehouse.VehicleTemperatures
WHERE  VehicleTemperatureID = 65994;

The value has not changed, because the error was thrown:

Temperature
---------------------------------------
4.20

If I make the procedure ignore the error, the data would be changed. I can then roll it back with the ROLLBACK command.

So, programming natively compiled code is very much the same as normal T-SQL, but there are some subtle, yet big differences you will need to contend with. For some things that you cannot use natively compiled code for, you can use a wrapper of interpreted code. For example, if you wanted to use a SEQUENCE object for a surrogate key, the pseudocode might be as follows:

CREATE PROCEDURE interpretedCode
   @columns datatype
AS --ignoring any error handling you might do
   DECLARE @surrogateKeyValue int = NEXT VALUE FOR NotNative_SEQUENCE;
   EXEC nativeCode @surrogateKeyValue = @surrogateKeyValue
                   @columns = @columns;

This way you can do as much natively as possible, but still use the constructs you need that are not currently supported.

Pitfalls

So far, everything has been all sunshine and lollipops for using stored procedures, but this isn’t always the case. We need to consider the following pitfalls:

  • The high initial effort to create procedures can be prohibitive.
  • It isn’t always easy to implement optional parameters in searches in an optimum manner.
  • It’s more difficult to affect only certain columns in an operation.

Another pitfall, which I won’t cover in detail here, is cross-platform coding. If you’re going to build a data layer that needs to be portable to different platforms such as Oracle or MySQL, this need for cross-platform coding can complicate your effort, although it can still be worthwhile in some cases.

High Initial Effort

Of all the pros and cons, the high initial effort to create procedures is most often the straw that breaks the camel’s proverbial back in the argument for or against stored procedures. For every time I’ve failed to get stored procedure access established as the method of access, this has been the reason given. There are many tools out there that can map a database to objects or screens to reduce development time. The problem is that they suffer from some or all of the issues discussed in the ad hoc SQL pitfalls. And with the new natively compiled procedures a real possibility to migrate to in the coming years, making stored procedures a point of emphasis (even using a hybrid approach, using native code where you can, and interpreted code where you must) is a great thing to target.

It’s an indefensible stance that writing lots of stored procedures takes less time up front—quite often, it takes quite a bit more time for initial development. Writing stored procedures is definitely an extra step in the process of getting an application up and running.

An extra step takes extra time, and extra time means extra money. You see where this is going, because people like activities where they see results, not infrastructure. When a charismatic new programmer comes in and promises fantastical results, it can be hard to back up claims that stored procedures are certainly the best way to go. The best defenses are knowing the pros and cons and, especially, understanding the application development infrastructure you’ll be dealing with.

Difficulty Supporting Optional Parameters in Searches

I already mentioned something similar to optional parameters earlier when talking about dynamic SQL. In those examples, all of the parameters used simple LIKE parameters with character strings. But what about integer values? Or numeric ones? As mentioned earlier in the “Ad Hoc SQL” section, a possible solution is to pass NULL into the variable values by doing something along the lines of the following code:

WHERE  (integerColumn = @integerColumn or @integerColumn is null)
  AND  (numericColumn = @numericColumn or @numericColumn is null)
  AND  (characterColumn like @characterColumn);

Generally speaking, it’s possible to come up with some scheme along these lines to implement optional parameters alongside the rigid needs of procedures in stored procedures. Note too that using NULL as your “get everything” parameter value means it is then hard to get only NULL values. For character strings, you can use LIKE ’%’, but that isn’t necessarily the best-performing construct either, since a user could pass in ’%value%’ and then it would need to scan all rows, so the plan has to be ready to do “good enough.” You can even use additional parameters to state @returnAllTypeFlag to return all rows of a certain type.

It isn’t possible to come up with a perfect scheme, especially a scheme that can be optimized. However, you can always fall back on using dynamic SQL for these types of queries using optional parameters, just like I did in the “Ad Hoc SQL” section. One thing that can help this process is to add the WITH RECOMPILE clause to the stored-procedure declaration. This tells the procedure to create a new plan for every execution of the procedure.

Although I try to avoid dynamic SQL because of the coding complexity and maintenance difficulties, if the set of columns you need to deal with is large, dynamic SQL can be the best way to handle the situation. Using dynamically built stored procedures is generally the same speed as using ad hoc access from the client, so the benefits from encapsulation still exist.

Difficulty Affecting Only Certain Columns in an Operation

When you’re coding stored procedures without dynamic SQL, the code you’ll write is going to be pretty rigid. If you want to write a stored procedure to modify a row in the table created earlier in the chapter—Sales.Contact—you’d write something along the lines of this skeleton procedure (back in the Chapter13 database):

CREATE PROCEDURE Sales.Contact$Update
(
    @ContactId   int,
    @FirstName   varchar(30),
    @LastName    varchar(30),
    @CompanyName varchar(100),
    @SalesLevelId  int,
    @ContactNotes  varchar(max)
)
AS
    DECLARE @entryTrancount int = @@TRANCOUNT;
    BEGIN TRY
          UPDATE Sales.Contact
          SET    FirstName = @FirstName,
                 LastName = @LastName,
                 CompanyName = @CompanyName,
                 SalesLevelId = @SalesLevelId,
                 ContactNotes = @ContactNotes
          WHERE  ContactId = @ContactId;
    END TRY
    BEGIN CATCH
      IF @@TRANCOUNT > 0
           ROLLBACK TRANSACTION;
      DECLARE @ERRORmessage nvarchar(4000)
      SET @ERRORmessage = ’Error occurred in procedure ’’’ +
                  OBJECT_NAME(@@procid) + ’’’, Original Message: ’’’
                 + ERROR_MESSAGE() + ’’’’;
      THROW 50000,@ERRORmessage,1;
   END CATCH;

A procedure such as this is fine most of the time, because it usually isn’t a big performance concern just to pass all values and modify those values, even setting them to the same value and revalidating. However, in some cases, validating every column can be a performance issue because not every validation is the same as the next.

For example, suppose that the SalesLevelId column is a very important column for the corporate sales process and needs to validate in the sales data if the customer should, in fact, actually be at the level assigned. A trigger might be created to do that validation, and it could take a relatively large amount of time. Note that when the average operation takes 1 millisecond, 100 milliseconds can actually be “a long time.” It is all relative to what else is taking place and how many times a minute things are occurring. You could easily turn this into a dynamic SQL procedure, though since you don’t know if the value of SalesLevelId has changed, you will have to check that first:

ALTER PROCEDURE Sales.Contact$update
(
    @ContactId   int,
    @FirstName   varchar(30),
    @LastName    varchar(30),
    @CompanyName varchar(100),
    @SalesLevelId  int,
    @ContactNotes  varchar(max)
)
WITH EXECUTE AS SELF
AS
    DECLARE @entryTrancount int = @@trancount;
    BEGIN TRY
       --declare variable to use to tell whether to include the sales level
       DECLARE @salesOrderIdChangedFlag bit =
                       CASE WHEN (SELECT SalesLevelId
                                  FROM   Sales.Contact
                                  WHERE  ContactId = @ContactId) =
                                                             @SalesLevelId
                            THEN 0 ELSE 1 END;
        DECLARE @query nvarchar(max);
        SET @query = ’
        UPDATE Sales.Contact
        SET    FirstName = ’ + QUOTENAME (@FirstName,’’’’) + ’,
               LastName = ’ + QUOTENAME(@LastName,’’’’) + ’,
               CompanyName = ’ + QUOTENAME(@CompanyName, ’’’’) + ’,
                ’+ CASE WHEN @salesOrderIdChangedFlag = 1 THEN
                ’SalesLevelId = ’ + QUOTENAME(@SalesLevelId, ’’’’) + ’,
                     ’ else ’’ END + ’,
                    ContactNotes = ’ + QUOTENAME(@ContactNotes,’’’’) + ’
         WHERE  ContactId = ’ + CAST(@ContactId AS varchar(10)) ;
         EXECUTE (@query);
    END TRY
    BEGIN CATCH
      IF @@TRANCOUNT > 0
           ROLLBACK TRANSACTION;
      DECLARE @ERRORmessage nvarchar(4000)
      SET @ERRORmessage = ’Error occurred in procedure ’’’ +
                  OBJECT_NAME(@@procid) + ’’’, Original Message: ’’’
                 + ERROR_MESSAGE() + ’’’’;
      THROW 50000,@ERRORmessage,1;
   END CATCH;

This is a pretty simple example, and as you can see the code is already getting pretty darn ugly. Of course, the advantage of encapsulation is still intact, since the user will be able to do exactly the same operation as before with no change to the public interface, but the code is immediately less manageable at the module level.

An alternative you might consider would be an INSTEAD OF trigger to conditionally do the update on the column in question if the inserted and deleted columns don’t match:

CREATE TRIGGER Sales.Contact$insteadOfUpdate
ON Sales.Contact
INSTEAD OF UPDATE
AS
BEGIN
   SET NOCOUNT ON;
   SET ROWCOUNT 0; --in case the client has modified the rowcount
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
   DECLARE @msg varchar(2000),    --used to hold the error message
   --use inserted for insert or update trigger, deleted for update or delete trigger
   --count instead of @@rowcount due to merge behavior that sets @@rowcount to a number
   --that is equal to number of merged rows, not rows being checked in trigger
           @rowsAffected int = (SELECT COUNT(*) FROM inserted);
   --           @rowsAffected int = (SELECT COUNT(*) FROM deleted);
   --no need to continue on if no rows affected
   IF @rowsAffected = 0 RETURN;
   BEGIN TRY
          --[validation blocks]
          --[modification blocks]
          --<perform action>
          UPDATE Contact
          SET    FirstName = inserted.FirstName,
                 LastName = inserted.LastName,
                 CompanyName = inserted.CompanyName,
                 PersonalNotes = inserted.PersonalNotes,
                 ContactNotes = inserted.ContactNotes
          FROM   Sales.Ccontact AS Contact
                    JOIN inserted
                        ON inserted.ContactId = Contact.ContactId
          IF UPDATE(SalesLevelId) --this column requires heavy validation
                                  --only want to update if necessary
               UPDATE Contact
               SET    SalesLevelId = inserted.SalesLevelId
               FROM   Sales.Contact
                         JOIN inserted
                              ON inserted.ContactId = Contact.ContactId
              --this correlated subquery checks for values that have changed
              WHERE  EXISTS (SELECT *
                             FROM   deleted
                             WHERE  deleted.ContactId =
                                             inserted.ContactId
                               AND  deleted.SalesLevelId <>
                                             inserted.SalesLevelId)
   END TRY
   BEGIN CATCH
               IF @@trancount > 0
                     ROLLBACK TRANSACTION;
              THROW;
     END CATCH
END;

This is a lot of code, but it’s simple. This is one of the rare uses of INSTEAD OF triggers, but it’s pretty simple to follow. Just update the simple columns and not the “high cost” columns, unless it has changed. The point of this is to note that the more encapsulated you get from the client, the more you can do in your code to modify the code to your needs. The stored procedure layer can be treated as a set of modules that return a set of data, save the state of some data, and so forth, without the client needing to know anything about the structure of anything other than the parameters and tabular data streams that are to be returned.

Stored Procedure or Ad Hoc?

If the opinions in this section on stored procedures and ad hoc SQL were not enough (and they weren’t), this section lays out my opinions on what is good and bad about using ad hoc SQL and stored procedures. As Oscar Wilde was quoted as saying, “It is only about things that do not interest one that one can give a really unbiased opinion, which is no doubt the reason why an unbiased opinion is always absolutely valueless.” This is a topic that I care about, and I have firm feelings about what is right and wrong. Of course, it is also true that many viable, profitable, and stable systems don’t follow any of these opinions. That said, let’s recap the pros and cons I have given for the different approaches. The pros of using ad hoc SQL are as follows:

  • It gives a great deal of flexibility over the code, as the code can be generated right at runtime, based on metadata, or even the user’s desires. The modification statement can only update column values that have changed.
  • It can give adequate or even improved performance by only caching and parameterizing obviously matching queries. It also can be much easier to tailor queries in which you have wildly varying parameter and join needs.
  • It’s fast. If the programmer can write a SQL statement or use an API that does, there’s less overhead learning about how to write stored procedures.

The cons of ad hoc SQL access are as follows:

  • Your client code and database structures are tightly coupled, and when any little thing changes (column name, type, etc., for example) in the database, it often requires making a change to the client code, requiring greater costs in deploying changes.
  • Tying multiple statements together can be cumbersome, especially when transactions are required.
  • API-generated queries often are not optimal, causing performance and especially maintenance issues when the database administrator has to optimize queries that cannot be easily modified.
  • Performance tuning database calls can be much harder to do, because modifying a statement, even to add a query hint, requires a recompile.

For stored procedure access, the pros are as follows:

  • The encapsulation of database code reduces what the user interface needs to know about the implemented database structures. If they need to change, often you can change the structures and tweak a procedure, and the client needn’t know.
  • You can easily manage security at the procedure level, with no need whatsoever to grant rights to base tables. This way, users don’t have to have rights to any physical tables.
  • You have the ability to do dynamic SQL in your procedures. And you can do this without the need to grant rights to objects using EXECUTE AS.
  • Performance is improved, due to the parameterizing of all plans (unless you specifically ask otherwise with a RECOMPILE hint or object setting).
  • Performance tuning is made simpler, due to the ability to know the surface area to tune. Tuning can be done without the client knowing the difference.
  • Taking this performance-tuning argument one step further, stored procedures will help you make the leap to fully compiled native code in the future, which can increase throughput an order of magnitude.

The cons for stored-procedure access are as follows:

  • The rigid code of precompiled stored procedures can make coding them difficult.
  • You can’t effectively vary the columns affected by any T-SQL statement.
  • There’s a larger initial effort required to create the procedures.

With no outside influence other than this list of pros and cons and experience, I can state without hesitation that stored procedures are the way to go, if for no other reason than the encapsulation angle. By separating the database code from the client code, you get an effective separation of data-manipulation code from presentation code. But “no outside influence” is a pipe dream, as developers will have their own ideas, and to be realistic, I am obviously open to the value of using an ORM-type tool that encapsulates a lot of the work of application building as well, keeping in mind that development costs are dwarfed by maintenance costs when you consider a system outage means users aren’t producing, but still getting paid.

I must also note that I’m not suggesting that all code that works with data should be in stored procedures. Too often when stored procedures are used as the complete data interface, the people doing the programming have a tendency to start putting all sorts of procedural code in the procedures, making them hard to write and hard to maintain. The next step is moaning that procedures are terrible, slow, and inflexible. This is often one of the sticking points between the two different opinions on how to do things. More or less, what’s called for when building a user interface is to build stored procedures that replace T-SQL statements that you would have built in an ad hoc manner, using T-SQL control of flow language at a minimum. Several types of code act on data that shouldn’t be in stored procedures or T-SQL:

  • Mutable business logic and rules: T-SQL is a rigid language that can be difficult to work with. Even writing CLR SQL Server objects (covered in the next section of this chapter) is unwieldy in comparison to building an adequate business layer in your application.
  • Formatting data: When you want to present a value in some format, it’s best to leave this to the presentation layer or user interface of the application. You should use SQL Server primarily to do set-based operations using basic DML, and have as little of the T-SQL control of flow language as possible.

Probably the biggest drawback to using procedures in earlier versions of SQL Server was eliminated in 2005 in the EXECUTE AS clause on the procedure creation. By carefully using the EXECUTE AS clause, you can change the security context of the executor of a procedure when the ownership chain is broken. So, in any places where a dynamic call is needed, you can make a dynamic call, and it will look to the user exactly as a normal precompiled stored procedure would—again, making sure to avoid the very dangerous SQL injection errors that are far too common.

I don’t want to sound as if any system largely based on ad hoc SQL is permanently flawed and just a festering pile of the smelly bits of a nerf-herding orangutan. Many systems have been built on letting the application handle all the code, especially when tools are built that need to run on multiple platforms. This kind of access is exactly what SQL-based servers were originally designed for, so it isn’t going to hurt anything. At worst, you’re simply not using one of the advanced features that SQL Server gives you in stored procedures.

The one thing that often tips the scales to using ad hoc access is time. The initial effort required to build stored procedures is going to be increased over just using ad hoc SQL generated from a mapping layer. In fact, for every system I’ve been involved with where our access plan was to use ad hoc SQL, the primary factor was time: “It takes too long to build the procedures,” or “It takes too long to develop code to access the stored procedures.” Or even, “The tool we are using doesn’t support stored procedures.” All this inevitably swings to the statement that “The DBA is being too rigid. Why do we want to…?”

These responses are a large part of why this section of the chapter needed to be written. It’s never good to state that the ad hoc SQL is just plain wrong, because that’s clearly not true. The issue is which is better, and stored procedures greatly tip the scale, at least until outside forces and developer talents are brought in.

T-SQL and the CLR

Many world-class and mission-critical corporate applications have been created using T-SQL and SQL Server, so why integrate SQL Server with the CLR? The fact is, integrating with the CLR provides a host of benefits to developers and DBAs that wasn’t possible or wasn’t easy with SQL Server 2000 and earlier. It also opens up a plethora of questions about the applicability of this reasonably new technology.

In most of this chapter, I have approached the topic of stored procedures and ad hoc access to data, but as of SQL Server 2005, there’s another interesting architectural option to consider. Beyond using T-SQL to code objects, you can use a .NET language to write your objects to run not in the interpreted manner that T-SQL objects do but rather in what is known as the SQLCLR, which is a SQL version of the CLR that is used as the platform for the .NET languages to build objects that can be leveraged by SQL Server just like T-SQL objects.

Using the SQLCLR, Microsoft provides a choice in how to program objects by using the enhanced programming architecture of the CLR for SQL Server objects that may not be possible with the new natively compiled objects due to their inability to access on-disk data. By hosting the CLR inside SQL Server, developers and DBAs can develop SQL Server objects using any .NET-compatible language, such as C# or Visual Basic. This opens up an entire new world of possibilities for programming SQL Server objects and makes the integration of the CLR one of the most powerful new development features of SQL Server.

Back when the CLR was introduced to us database types, it was the most feared new feature of SQL Server. As adoption of SQL Server versions 2005 and greater is the norm now, use of the CLR still may be the problem we suspected, but the fact is, properly built objects written in the CLR need to follow many of the same principals as T-SQL and the CLR can be very useful when you need it.

Microsoft chose to host the CLR inside SQL Server for many reasons; some of the most important motivations follow:

  • Rich language support: .NET integration allows developers and DBAs to use any .NET—compatible language for coding SQL Server objects. This includes such popular languages as C# and VB.NET.
  • Complex procedural logic and computations: T-SQL is great at set-based logic, but .NET languages are superior for procedural/functional code. .NET languages have enhanced looping constructs that are more flexible and perform far better than T-SQL. You can more easily factor .NET code into functions, and it has much better error handling than T-SQL. T-SQL has some computational commands, but .NET has a much larger selection of computational commands. Most important for complex code, .NET ultimately compiles into native code, while T-SQL is an interpreted language. This can result in huge performance wins for .NET code.
  • String manipulation, complex statistical calculations, custom encryption, and so on: As discussed earlier, heavy computational requirements such as string manipulation, complex statistical calculations, and custom encryption algorithms that don’t use the native SQL Server encryption fare better with .NET than with T-SQL in terms of both performance and flexibility.
  • .NET Framework classes: The .NET Framework provides a wealth of functionality within its many classes, including classes for data access, file access, registry access, network functions, XML, string manipulation, diagnostics, regular expressions, arrays, and encryption.
  • Leveraging existing skills: Developers familiar with .NET can be productive immediately in coding SQL Server objects. Familiarity with languages such as C# and VB.NET, as well as being familiar with the .NET Framework, is of great value. Microsoft has made the server-side data-access model in ADO.NET similar to the client-side model, using many of the same classes to ease the transition. This is a double-edged sword, as it’s necessary to determine where using .NET inside SQL Server provides an advantage over using T-SQL.
  • New SQL Server objects and functionality: If you want to create user-defined aggregates or user-defined types (UDTs) that extend the SQL Server type system, .NET is your only choice. You can’t create these objects with T-SQL. There’s also some functionality only available to .NET code that allows for streaming table-valued functions.
  • Integration with Visual Studio: Visual Studio is the premier development environment from Microsoft for developing .NET code. This environment has many productivity enhancements for developers. The Professional and higher editions also include a new SQL Server project, with code templates for developing SQL Server objects with .NET. These templates significantly ease the development of .NET SQL Server objects. Visual Studio .NET also makes it easier to debug and deploy .NET SQL Server objects.

However, while these are all good reasons for the concept of mixing the two platforms, it isn’t as if the CLR objects and T-SQL objects are equivalent. As such, it is important to consider the reasons that you might choose the CLR over T-SQL, and vice versa, when building objects. The inclusion of the CLR inside SQL Server offers an excellent enabling technology that brings with it power, flexibility, and design choices. And of course, as we DBA types are cautious people, there’s a concern that the CLR is unnecessary and will be misused by developers. Although any technology has the possibility of misuse, you shouldn’t dismiss the SQLCLR without consideration as to where it can be leveraged as an effective tool to improve your database designs.

What really makes using the CLR for T-SQL objects is that in some cases, T-SQL just does not provide native access to the type of coding you need without looping and doing all sorts of machinations. In T-SQL it is the SQL queries and smooth handling of data that makes it a wonderful language to work with. In almost every case, if you can fashion a SQL query to do the work you need, T-SQL will be your best bet. However, once you have to start using cursors and/or T-SQL control of flow language (for example, looping through the characters of a string or through rows in a table), performance will suffer mightily. This is because T-SQL is an interpreted language. In a well-thought-out T-SQL object, you may have a few non-SQL statements, variable declarations, and so on. Your statements will not execute as fast as they could in a CLR object, but the difference will often just be milliseconds if not microseconds.

I am not going to provide examples of CLR coding, because it really isn’t something that you will need for most implementations in a relational database. If you want to know more about the SQLCLR, I would suggest any of Adam Machanic’s writing/presentations on the subject (sqlblog.com/blogs/adam_machanic/archive/tags/SQLCLR/default.aspx), Itzik Ben-Gan et al.’s T-SQL Querying (Microsoft Press, 2015), or Solomon Rutzky’s Stairway to SQLCLR series on SQL Server Central (www.sqlservercentral.com/stairway/105855/).

Best Practices

The first half of the chapter discussed the two primary methods of architecting a SQL Server application, either by using stored procedures as the primary interface, or by using ad hoc calls built outside the server. Either is acceptable, but in my opinion the best way to go is to use stored procedures as much as possible. There are a few reasons:

  • As precompiled batches of SQL statements that are known at design and implementation time, you get a great interface to the database that encapsulates the details of the database from the caller.
  • They can be a performance boost, primarily because tuning is on a known set of queries, and not just on any query that the programmer might have written that slips by untested (not even maliciously; it could just be a bit of functionality that only gets used “occasionally”).
  • They allow you to define a consistent interface for security that lets you give users access to a table in one situation but not in another. Plus, if procedures are consistently named, giving access to database resources is far easier.

However, not every system is written using stored procedures. Ad hoc access can serve to build a fine system as well. You certainly can build a flexible architecture, but it can also lead to harder-to-maintain code that ends up with the client tools being tightly coupled with the database structures. At the very least, if you balk at the use of procedures, make sure to architect in a manner that makes tuning your queries reasonable without full regression testing of the application.

I wish I could give you definitive best practices, but there are so many possibilities, and either method has pros and cons. (Plus, there would be a very nerdy mob with cell phone torches and pitchforks at my door, no matter how I said things must be done.) This topic will continue to be hotly contested, and rightly so. In each of the last few releases of SQL Server, Microsoft has continued to improve the use of ad hoc SQL, but it’s still considered a best practice to use stored procedures if you can. I realize that in a large percentage of systems that are created, stored procedures are only used when there’s a compelling reason to do so (like some complex SQL, or perhaps to batch together statements for a transaction).

Whether or not you decide to use stored-procedure access or use ad hoc calls instead, you’ll probably want to code some objects for use in the database. Introduced in SQL Server 2005, there’s another interesting decision to make regarding what language and technology to use when building several of these database objects. The best practices for the CLR usage are a bit more clear-cut:

  • User-defined functions: When there’s no data access, the CLR is almost always a better way to build user-defined functions; though the management issues usually make them a last choice unless there is a pressing performance issue. When data access is required, it will be dependent on the types of operations being done in the function, but most data access functions would be best at least done initially in T-SQL.
  • Stored procedures: For typical data-oriented stored procedures, T-SQL is usually the best course of action. On the other hand, when using the CLR, it’s far easier and much safer to create replacements for extended stored procedures (procedures typically named xp_) that do more than simply touch data.
  • User-defined types: For the most part, the advice here is to avoid them, unless you have a compelling reason to use them. For example, you might need complex datatypes that have operations defined between them (such as calculating the distance between two points) that can be encapsulated into the type. The client needs the datatype installed to get a natural interface; otherwise the clunky .NET-like methods need to be used (they aren’t SQL-like).
  • User-defined aggregates: You can only create these types of objects using .NET. User-defined aggregates allow for some interesting capabilities for operating on groups of data, like the example of a string aggregation.
  • Triggers: There seems to be little reason to use triggers built into a CLR language. Triggers are about data manipulation. Even with DDL triggers, the primary goal is usually to insert data into a table to make note of a situation.

Summary

In this chapter full of opinions, what’s clear is that SQL Server has continued to increase the number of options for writing code that accesses data. I’ve covered two topics that you need to consider when architecting your relational database applications using SQL Server. Designing the structure of the database is (reasonably) easy enough. Follow the principles set out by normalization to ensure that you have limited, if any, redundancy of data and limited anomalies when you modify or create data. On the other hand, once you have the database architected from an internal standpoint, you have to write code to access this data, and this is where you have a couple of seemingly difficult choices.

The case for using stored procedures is compelling (at least to many SQL architects), but it isn’t a definite. Many programmers use ad hoc T-SQL calls to access SQL Server (including those made from middleware tools), and this isn’t ever likely to change completely. This topic is frequently debated in blogs, forums, newsgroups, and church picnics with little budge from either side. I strongly suggest stored procedures for the reasons laid out in this chapter, but I do concede that it isn’t the only way.

I then introduced the CLR features, and presented a few examples and even more opinions about how and when to use them. I dare say that some of the opinions concerning the CLR in this chapter might shift a little over time, but so far, it remains the case that the CLR is going to be most valuable as a tool to supercharge parts of queries, especially in places where T-SQL was poor because it was interpreted at runtime, rather than compiled. Usually this isn’t a problem, because decent T-SQL usually has few procedural statements, and all the real work is done in set-based SQL statements.

The primary thing to take from this chapter is that lots of tools are provided to access the data in your databases. Use them wisely, and your results will be excellent. Use them poorly, and your results will be poor. Hopefully this advice will be of value to you, but as you were warned at the start of the chapter, a good amount of this chapter was opinion.

Last, the decision about the data-access method (i.e., ad hoc SQL code versus stored procedures and how much to use the CLR) should be made for a given project up front, when considering high-level design. For the sake of consistency, I would hope that the decision would be enforced across all components of the application(s). Nothing is worse than having to figure out the application as you dig into it.

Image Note  A resource that I want to point out for further reading after this chapter is by Erland Sommarskog. His web site (www.sommarskog.se) contains a plethora of information regarding many of the topics I have covered in this chapter—and in far deeper detail. I would consider most of what I have said the introductory-level course, while his papers are nearly graduate-level courses in the topics he covers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.133.233