© Elizabeth Noble 2020
E. NoblePro T-SQL 2019https://doi.org/10.1007/978-1-4842-5590-2_5

5. Set-Based Design

Elizabeth Noble1 
(1)
Roswell, GA, USA
 

Knowing how to design T-SQL is part of the foundation for writing professional code. Once you can write T-SQL code that can be easily understood, you can start to focus on improving the performance of your T-SQL code. If you learned T-SQL informally as part of your job or your primary function is writing code for software applications, you may not consciously consider set-based design when writing your T-SQL code. You may be an experienced database developer and want to learn more about set-based design.

In this chapter, I want to discuss how to work with your data. The first step is to get familiar with the various ways to interact with your data. Once you understand the different ways you can interact with your data, you want to start thinking about how to put the data in the best format for your queries. Then you should be able to write those queries in a way that takes advantages of SQL Server’s natural strengths.

Introduction to Set-Based Design

We live in a world where data is everywhere. Almost anything we do can generate data that needs to be saved. The first challenge in collecting data is determining how to store data. The second challenge is accessing that data. One of the reasons you will use SQL Server is to take advantage of the relationships between data. This is because SQL Server is a relational database management system (RDBMS). While our focus is writing T-SQL, it is important to also understand how the data is stored in SQL Server.

Going back to my example of a database for recipes, let’s look at what types of data can be associated with this type of information:
  • Name

  • Cost

  • Ingredients

  • Number of Servings

  • Preparation Time

  • Cooking Time

  • Type of Cuisine

  • Cooking Method

  • Type of Meal

I can group data together by how they describe other items. For instance, I can group information about recipes as shown in Table 5-1.
Table 5-1

General Attributes for Ingredients

Recipe

Ingredient

Cooking Method

Name

Name

Name

Cuisine

Unit of Measure

 

Number of Servings

Cost

 

Ingredients

  

Quantity of Ingredient

  
Once that information is grouped, the data can be stored by those groupings. If I were to start populating data for ingredients, the data could be stored in a table as shown in Table 5-2.
Table 5-2

General Ingredient Information

Ingredient Name

Unit(s)

Measurement

Cost

Diced Tomatoes

28

Ounces

$1.84

Tomato Paste

10

Tablespoon

$0.79

Tomato Sauce

8

Ounces

$0.55

Minced Garlic

60

Tablespoon

$5.99

The data organized in the table is a data set. Looking at the data in the first column, Ingredient Name, there are three records that contain the word tomato. There are also two rows that are measured in ounces.

One of the things that can be difficult when working with SQL Server is the use of set-based transactions. People come to SQL Server from other fields and professions. These include software development, networking, system administration, or general business operations. In each of these cases, a person may not have developed the skill set for thinking in set-based operations.

What are set-based operations? What do they mean? Why are they so important? Set-based operations are looking at data in terms of chunks or segments instead of looking at records or data row by row. To show you a comparison, Table 5-3 shows each data record individually.
Table 5-3

Ingredients by Individual Row

Ingredient Name

Diced Tomatoes

Ingredient Name

Tomato Paste

Ingredient Name

Tomato Sauce

Ingredient Name

Minced Garlic

If I were to select the data from Table 5-3 in this manner, it would be far less efficient than trying to select the data as a set. Table 5-4 shows what it would look like to select the data as a set.
Table 5-4

Ingredients as One Data Set

Ingredient Name

Diced Tomatoes

Tomato Paste

Tomato Sauce

Minced Garlic

Viewing the data like this allows you to also see the similarities between the data.

You are trying to figure out how to handle implementing a process or a task with multiple rows at the same time. You want to consider doing the same thing to all those rows at the same time. In the preceding example, I can quickly determine which records have the word tomato in common. The data is giving us a hint that if we want to look at or change something about ingredients with tomatoes, there may be more than one way to accomplish this goal. This type of scenario may indicate your logic should be structured in a way that it can function correctly by using algebraic logic instead of hard-coded values.

Procedural code is described as not only telling a system what to do but how to do it. In many companies the software developers or software engineers are the ones responsible for writing the T-SQL code. One of the things I have found is when writing application code, this code is designed to work best procedurally. Much of the code performs well when interacting with data iteratively. The cost of context switching between procedural application code and database T-SQL code can be significant. Overall the process of writing set-based code causes a very different thought process than dealing with the code iteratively.

Why are we even interested in using set-based design? Why is it worth the trouble? This has to do with how the SQL Server engine works and how data is retrieved for queries. When a query is issued, SQL Server needs to retrieve the data record. SQL Server goes to the buffer cache to see if this data is available. Figure 5-1 shows the process SQL Server goes through to retrieve data as part of a query request if the data is cached in the buffer pool.
../images/480547_1_En_5_Chapter/480547_1_En_5_Fig1_HTML.jpg
Figure 5-1

Data Retrieval from a Single Query

If the query is written procedurally, that is, to execute every time a new row is retrieved, SQL Server will need to go to the buffer cache for each record. The process of retrieving a single row at a time can increase the workload on SQL Server. In Figure 5-2, you can see the additional workload required to retrieve one row at a time.
../images/480547_1_En_5_Chapter/480547_1_En_5_Fig2_HTML.jpg
Figure 5-2

Data Retrieval when Query Retrieves One Row at a Time

Figure 5-2 shows the query process to retrieve a single record at a time with the data stored in the buffer pool. However, if the query is written for set-based design, then there will be one call to the buffer pool. On a small set of data, this may not be a very large impact. When dealing with data sets of thousands or millions of records, the difference in execution time can be drastic. There is also something else at play here that has to do with how the data is stored in the buffer cache. When SQL Server executes a query and pulls back data, it only pulls back data that exists in the buffer cache. Starting in SQL Server 2019, there is a feature called the hybrid buffer pool. The hybrid buffer pool allows both data stored on RAM and in persistent memory (PMEM) to be treated as being part of the buffer cache. Otherwise, if the data is not available in RAM or PMEM, then SQL Server will need to go to disk to retrieve the data. The data retrieved will then be stored in memory. Figure 5-3 shows how this data would be retrieved from disk and stored in the buffer cache.
../images/480547_1_En_5_Chapter/480547_1_En_5_Fig3_HTML.jpg
Figure 5-3

Data Retrieval from Disk to the Buffer Cache

When the query is executed, that data may not even be in the buffer cache at that point in time. SQL Server must go to the subsystem disk I/O to find that data. Once that data is found, that data is stored in the buffer cache.

If each row is pulled individually, that call to the disk I/O will have to happen individually for each record. Pulling data from the buffer cache is not instantaneous. Getting data from the disk I/O and putting it in the buffer cache to then be retrieved by the query is even more resource intensive. Ultimately this creates a scenario where you want to focus on set-based design to have the best possible performance for the queries that you are executing.

Thinking in Data Sets

Now that I have discussed what data sets are, I want you to be comfortable thinking about data in a way that embraces data sets. Before you start writing T-SQL code, it may save you time and energy by first understanding what information is stored, how the data is stored, and selecting or using that data in groups of similar characteristics. Working with data sets is also easier to accomplish when the underlying tables are designed to work best with data sets.

The real trick is learning how to think in terms of data sets. This means how to think about handling multiple records at once instead of a single record. This really requires an approach that is less arithmetic and more algebraic. In general, performing SELECT statements is the easiest way to get comfortable with set-based transactions. Consider the data shown in Table 5-5.
Table 5-5

Ingredient Data

Ingredient Name

Unit(s)

Measurement

Cost

Diced Tomatoes

28

Ounces

$1.85

Tomato Paste

10

Tablespoons

$0.79

Tomato Sauce

8

Ounces

$0.55

Minced Garlic

60

Tablespoon

$5.99

This is a list of possible ingredients for a recipe along with the number of units, the unit of measure, and the cost per unit. By looking at the preceding data, I can find similarities between the data in some of the columns.

I can create a data set that gets all records where the measurement is tablespoon. Those records are shown in Table 5-6.
Table 5-6

Ingredients with Measurement of Tablespoon

Ingredient Name

Unit(s)

Measurement

Cost

Tomato Paste

10

Tablespoons

$0.79

Minced Garlic

60

Tablespoon

$5.99

I could instead get the data set for all ingredients where the cost was less than $1.00 as shown in Table 5-7.
Table 5-7

Ingredients with a Cost Less Than $1.00

Ingredient Name

Unit(s)

Measurement

Cost

Tomato Paste

10

Tablespoons

$0.79

Tomato Sauce

8

Ounces

$0.55

Now that I have a set of data, I can consider how I want to interact with that data set. Knowing that I can group like data together and perform an action on that entire set of data is the key to using T-SQL code effectively. If I want to find out that cost per unit, I can divide the cost by the number of units. To calculate the cost per unit for each record, I can calculate the cost per unit manually for each record. In this case, the cost per unit for tomato paste would be $0.79/10. This would give the result of $0.08 per tablespoon. Alternatively, I can use the columns in the data sets to do the same calculation. To calculate the cost per unit as a data set, I can specify Cost/Unit(s). The result of handling the data in this fashion would be an output like the one shown in Table 5-8.
Table 5-8

Calculated Value for Cost per Unit of an Ingredient

Ingredient Name

Cost per Unit

Tomato Paste

$0.08 per tablespoon

Tomato Sauce

$0.07 per ounce

One of the greatest challenges working with SQL Server is learning how to think in data sets. The simplest way to think of data sets is to view the process of retrieving data as performing an action on a column or a subset of a column. If you can find some sort of pattern on how to perform that action on a subset of the data, then you are thinking in terms of data sets.

Looking at the ingredient table, I could create a data set where all ingredients had the word tomato in the ingredient name. All the available values in the Ingredient table are in Table 5-9.
Table 5-9

General Listing of Ingredients

IngredientID

Ingredient Name

Unit(s)

Measurement

Cost

1

Diced Tomatoes

28

Ounces

$1.85

2

Tomato Paste

10

Tablespoons

$0.79

3

Tomato Sauce

8

Ounces

$0.55

4

Minced Garlic

60

Tablespoon

$5.99

I have a couple different ways to do this. I can write a query that looks where the IngredientID is 1, 2, or 3. You can see the results of this logic shown in Table 5-10.
Table 5-10

Ingredients Containing the Work Tomato

IngredientID

Ingredient Name

Unit(s)

Measurement

Cost

1

Diced Tomatoes

28

Ounces

$1.85

2

Tomato Paste

10

Tablespoons

$0.79

3

Tomato Sauce

8

Ounces

$0.55

Going back to the preceding table, I want to add a new record for Tomato Soup. I show the new record added to Table 5-11.
Table 5-11

Adding Tomato Soup to Ingredients

IngredientID

Ingredient Name

Unit(s)

Measurement

Cost

1

Diced Tomatoes

28

Ounces

$1.85

2

Tomato Paste

10

Tablespoons

$0.79

3

Tomato Sauce

8

Ounces

$0.55

4

Minced Garlic

60

Tablespoon

$5.99

5

Tomato Soup

10.75

Ounces

$1.58

If I want to still show all records that have tomato and I use the same logic as earlier which is to use the IngredientID to display the desired values, I will not get the results I expect. If I did not change how the data was pulled back and used the same data to pull back records like I did in table, then I would get the following result set as shown in Table 5-12.
Table 5-12

Ingredients with Tomato Using Hard-Coded IngredientID

IngredientID

Ingredient Name

Unit(s)

Measurement

Cost

1

Diced Tomatoes

28

Ounces

$1.85

2

Tomato Paste

10

Tablespoons

$0.79

3

Tomato Sauce

8

Ounces

$0.55

Table 5-13 shows the expected results by changing my logic to look at the ingredient name to find all ingredients containing the word tomato.
Table 5-13

Ingredients with Tomato Searching on the Ingredient Name

IngredientID

Ingredient Name

Unit(s)

Measurement

Cost

1

Diced Tomatoes

28

Ounces

$1.85

2

Tomato Paste

10

Tablespoons

$0.79

3

Tomato Sauce

8

Ounces

$0.55

5

Tomato Soup

10.75

Ounces

$1.58

In that case, I would want to run a query where the ingredient name was like tomato. As you can see in this example, there are not only limitations with regard to data sets that affect performance but can also affect functionality.

One of the largest challenges in using data sets has to do with inserting data records. In most cases, when you insert a data record, you are inserting a single record at a time. This can cause it to become a habit to only deal with inserting one individual record at a time. When you are designing new table or moving data, you may have situations where you can think of an insert in terms of a data set. This will often happen when you are creating a query to insert data into another table.

While it does not seem like these situations will come up very often, I am frequently using data sets when I am populating temporary tables, table variables, and common table expressions.

Inserting data as a set uses the foundation of selecting data as data set. Once you a select data as a set, you can insert data as a set. I have used inserting data into an object like a temporary table for a couple of different reasons. One of the more common situations where I insert data sets is when I am joining several different tables and implementing criteria that make complicated calculations or functions. Looking at Table 5-14, you can see some examples of data that will be inserted into dbo.RecipeIngredient.
Table 5-14

Ingredients in a Recipe by Each Record

RecipeID

IngredientID

IsActive

Spaghetti

Diced Tomatoes

True

RecipeID

IngredientID

IsActive

Spaghetti

Tomato Paste

True

RecipeID

IngredientID

IsActive

Spaghetti

Tomato Sauce

True

RecipeID

IngredientID

IsActive

Spaghetti

Minced Garlic

True

While all these entries could be inserted individually, it is also possible to insert multiple records at once. In this case, it may look something like Table 5-15.
Table 5-15

Ingredients in a Recipe as a Data Set

RecipeID

IngredientID

IsActive

Spaghetti

Diced Tomatoes

True

Spaghetti

Tomato Paste

True

Spaghetti

Tomato Sauce

True

Spaghetti

Minced Garlic

True

Another scenario where I have found using bulk inserts into temporary objects is when those records may require additional modifications by joining to other queries that include multiple table joins.

If you are changing how your application code works and inserting new tables or changing how the value should be populated, you may find yourself needing to perform bulk updates. While you can perform bulk updates by updating one record at a time with hard-coded values, you will want to see if there is a way to update all the records in the data set by using the same code logic. However, this can be where data sets really help you out.

You can write one query to find all the records that you need to update and systematically perform all those updates. In addition to the advantages in terms of performance and functionality, this type of process also ensures that your data is being handled consistently.

Overall, the process for updating data as a data set is very much like the methods discussed for selecting and inserting data. In Table 5-16, you can see information stored in the dbo.Recipe table.
Table 5-16

Recipes with Servings and Meal Type

Recipe

Serving Quantity

Meal Type

Spaghetti

8

Dinner

Spinach Frittata

6

Breakfast

Roasted Chicken

4

Dinner

Dinner Rolls

12

Bread

While this is how the recipes are currently saved, I may decide that I want to change how meal types are stored. I may want to change the meal type by the main ingredient rather than by the time of day the meal was served. In that case, I could update the records one at a time as shown in Table 5-17.
Table 5-17

Recipes as Individual Records

Recipe

Serving Portion

Original Meal Type

New Meal Type

Spaghetti

8

Dinner

Meat

Recipe

Serving Portion

Original Meal Type

New Meal Type

Spinach Frittata

6

Breakfast

Egg

Recipe

Serving Portion

Original Meal Type

New Meal Type

Roasted Chicken

4

Dinner

Meat

Recipe

Serving Portion

Original Meal Type

New Meal Type

Dinner Rolls

12

Bread

Grain

If I used a simple form of logic to determine what records should be updated, I may be able to write the updates to run in the manner shown in Table 5-18.
Table 5-18

Recipes as Data Sets by Meal Type

Recipe

Serving Portion

Original Meal Type

New Meal Type

Spaghetti

8

Dinner

Meat

Roasted Chicken

4

Dinner

Meat

Recipe

Serving Portion

Original Meal Type

New Meal Type

Spinach Frittata

6

Breakfast

Egg

Recipe

Serving Portion

Original Meal Type

New Meal Type

Dinner Rolls

12

Bread

Grain

While these updates could be done in a with hard-coded values, there are times that it may be better to join the table you are updating to other related tables. This type of interaction between the tables may make it easier to perform your updates.

You could use the ingredients in the recipe to determine how to update the meal types on your recipes. It helps to think about the logic you would want to use. If you wanted to update the meal type to a generic category based upon the main ingredient per recipe, you would first need to figure out what the main ingredient was per recipe. You would also need to determine general categories for your meal type. In the preceding scenario, I might have Meat, Egg, Fruit, Vegetable, Grain, and Sugar. Once I have both pieces of information, I may create a temporary table to map the ingredients to a specific category, unless I have designed the ingredient table to record that specific category per ingredient.

In general, I found anytime that I am dealing with data one record of the time, I am often having to perform some type of manual interaction. I have found that manual interactions are the most likely to have issues or errors. I much prefer writing code in data sets, so that I can be more assured that my logic is being consistent.

Writing Code for Data Sets

Previously, I discussed how to start thinking about your data as data sets instead of individual records. I frequently use data sets when selecting, updating, or deleting data. While most application code may insert single records at a time, there are many common scenarios where inserting data as data set may also be helpful. In this next section, I will walk through various scenarios where you may want to think about your data as sets instead of individual records.

The most frequent use for data sets is when you want to view a portion of data. Ideally, these data sets are selected based upon specific criteria. This is not only good for performance but also can help make sure you are only looking at the specific data that you want. For most scenarios, it does not even make sense to retrieve and display one individual record at a time. Oftentimes, if you find yourself in a situation where you are selecting one record at a time, it is a good indication that you may want to see if that T-SQL code can be rewritten to use set-based logic.

There are some more complex ways that queries can be selected. In some cases, you may find yourself wanting to combine or compare two different data sets. If you want to join two data sets together, you have the option of UNION or UNION ALL. There is only one small but important difference between the two of these. When you UNION data, each record returned will be distinct between the SELECT statements that are joined together. Listing 5-1 shows a UNION between two queries.
SELECT IngredientID, IngredientName
FROM dbo.Ingredient
WHERE IngredientName LIKE '%Tomato%'
UNION
SELECT IngredientID, IngredientName
FROM dbo.Ingredient
WHERE Measurement = 'Ounces'
Listing 5-1

Union of Two Queries

Table 5-19 shows the data currently stored for ingredients. In Table 5-19, I have five ingredients that are all active.
Table 5-19

General Ingredients

IngredientID

IngredientName

Measurement

IsActive

1

Italian Sausage

Pound

1

2

Tomato Sauce

Tablespoon

1

3

Diced Tomatoes

Ounces

1

4

Tomato Paste

Ounces

1

5

Minced Garlic

Ounces

1

Using Table 5-20, the first query from the preceding figure would return the following values.
Table 5-20

Ingredients with Tomatoes

IngredientID

IngredientName

2

Tomato Sauce

3

Diced Tomatoes

4

Tomato Paste

The data from the second query from Table 5-21 would be the following.
Table 5-21

Ingredients Measured in Ounces

IngredientID

IngredientName

3

Diced Tomatoes

4

Tomato Paste

5

Minced Garlic

A union would give us the following result set as shown in Table 5-22.
Table 5-22

Union of Results from Tables 5-21 and 5-22

IngredientID

IngredientName

2

Tomato Sauce

3

Diced Tomatoes

4

Tomato Paste

5

Minced Garlic

However, for a UNION ALL, every record returned in both queries will be returned regardless of whether there are duplicate values in the multiple queries. If we were to run the same query but use a UNION ALL, the T-SQL code would look like Listing 5-2.
SELECT IngredientID, IngredientName
FROM dbo.Ingredient
WHERE IngredientName LIKE '%Tomato%'
UNION ALL
SELECT IngredientID, IngredientName
FROM dbo.Ingredient
WHERE Measurement = 'Ounces'
Listing 5-2

Union All of Two Queries

Since we are returning all data, even if they are duplicate, the result for a UNION ALL would look like the values in Table 5-23.
Table 5-23

Union All of Results from Tables 5-21 and 5-22

IngredientID

IngredientName

2

Tomato Sauce

3

Diced Tomatoes

4

Tomato Paste

4

Tomato Paste

5

Minced Garlic

As you can see, the UNION ALL would return two rows that were identical. There are times where each of these scenarios may be desirable.

There are times where the query logic is complex enough that I may not quickly understand all the T-SQL code or other times where I am troubleshooting very large data sets. If I write a simpler query that returns all the data that I want, I will often write an INTERSECT between those two queries to find where the records match. Using the same general query as earlier, I want to show you what the data returned will look like when using INTERSECT as compared to the other options. In Listing 5-3, you can see the query to find the intersection between the two queries.
SELECT IngredientID, IngredientName
FROM dbo.Ingredient
WHERE IngredientName LIKE '%Tomato%'
INTERSECT
SELECT IngredientID, IngredientName
FROM dbo.Ingredient
WHERE Measurement = 'Ounces'
Listing 5-3

Intersect of Two Queries

We already know what results returned for each query individually from the preceding figures. The actual results for the preceding T-SQL code are shown in Table 5-24.
Table 5-24

Intersect of Results from Tables 5-21 and 5-22

IngredientID

IngredientName

4

Tomato Paste

In addition, if I am trying to find missing records or verify that there are matching records, I will use an EXCEPT between two queries. Listing 5-4 shows what the T-SQL code would look like if you were trying to exclude the results from one query from the results of the other query.
SELECT IngredientID, IngredientName
FROM dbo.Ingredient
WHERE IngredientName LIKE '%Tomato%'
EXCEPT
SELECT IngredientID, IngredientName
FROM dbo.Ingredient
WHERE Measurement = 'Ounces'
Listing 5-4

Except of Two Queries

In this case, the first query returns all results where the ingredient name includes the word tomato. However, the except statement indicates that if a record is returned in both the first and second query, then the record will be excluded from the result set. The results of this query is in Table 5-25.
Table 5-25

Except of Results from Tables 5-21 and 5-22

IngredientID

IngredientName

2

Tomato Sauce

3

Diced Tomatoes

Using an EXCEPT statement may not be the preferred method to exclude results from a data set, but it is the method that I am more accustomed to using. Therein lies the benefits and challenges of T-SQL code, for almost any scenario there is almost certainly more than one way to write T-SQL code. Depending on the reason you are writing database code will determine the level of flexibility you have in writing your code. If you are executing a query for a single time, you may be able to use a less efficient method to access this data. However, if you are writing your code for an application, you will want to balance how you write your T-SQL code to be both efficient and readable.

Listing 5-5 is an example of how to use data sets when inserting data.
CREATE TABLE #TempIngredientNutrition
(
      IngredientID      INT         NOT NULL,
      IngredientName    VARCHAR(25) NOT NULL,
      Calories          TINYINT     NOT NULL,
      Sugar             TINYINT     NOT NULL,
      Fiber             TINYINT     NOT NULL,
      Protein           TINYINT     NOT NULL
)
INSERT INTO #TempIngredientNutrition
(
      IngredientID,
      IngredientName,
      Calories,
      Sugar,
      Fiber,
      Protein
)
SELECT
      ing.IngredientID,
      ing.IngredientName,
      nutr.Calories,
      nutr.Sugar,
      nutr.Fiber,
      nutr.Protein
FROM dbo.Ingredient ing
      INNER JOIN dbo.IngredientNutrition nutr
      ON ing.IngredientID = nutr.IngredientID
Listing 5-5

Inserting Data as a Set

As shown in Listing 5-6, this is what the insert statements would look like if there were inserted record by record.
CREATE TABLE #TempIngredientNutrition
(
      IngredientID      INT         NOT NULL,
      IngredientName    VARCHAR(25) NOT NULL,
      Calories          TINYINT     NOT NULL,
      Sugar             TINYINT     NOT NULL,
      Fiber             TINYINT     NOT NULL,
      Protein           TINYINT     NOT NULL
)
INSERT INTO #TempIngredientNutrition
(
      IngredientID,
      IngredientName,
      Calories,
      Sugar,
      Fiber,
      Protein
)
VALUES (1, 'Italian Sausage', 180, 5, 1, 18)
INSERT INTO #TempIngredientNutrition
(
      IngredientID,
      IngredientName,
      Calories,
      Sugar,
      Fiber,
      Protein
)
VALUES (2, 'Tomato Sauce', 60, 7, 2, 1)
INSERT INTO #TempIngredientNutrition
(
      IngredientID,
      IngredientName,
      Calories,
      Sugar,
      Fiber,
      Protein
)
VALUES (3, 'Diced Tomatoes', 20, 2, 3, 1)
Listing 5-6

Inserting Data Record by Record

As you can see, inserting data record by record takes up considerably more code and can be quite a bit more tedious.

One of my favorite uses for data sets is to update data. In many situations, I am not updating a specific record, but I am updating multiple records that share the same characteristics. If I want to make all these ingredients inactive from the table, I have a couple different ways I can do this. For instance, I can write one update statement per record to set the IsActive value to zero as shown in Listing 5-7.
UPDATE dbo.Ingredient
SET IsActive = 0
WHERE IngredientID = 1
UPDATE dbo.Ingredient
SET IsActive = 0
WHERE IngredientID = 2
UPDATE dbo.Ingredient
SET IsActive = 0
WHERE IngredientID = 3
UPDATE dbo.Ingredient
SET IsActive = 0
WHERE IngredientID = 4
UPDATE dbo.Ingredient
SET IsActive = 0
WHERE IngredientID = 5
Listing 5-7

Updating Data Record by Record

This would require five unique transactions. In addition, SQL Server would have to access the data page that each of these records was on five times. As shown in Listing 5-8, I could also write a query that would allow me to update all five records at once.
UPDATE dbo.Ingredient
SET IsActive = 0
WHERE IngredientID BETWEEN 1 AND 5
Listing 5-8

Updating Data as a Range

In fact, if all five of these records lived on the same data page, I would only need to access that page once. A much more likely scenario would be a request to deactivate all ingredients that contained the word tomato in the ingredient name. This is type of pattern you want to find. If there is a pattern like this, you can write a query like the one in Listing 5-9.
UPDATE dbo.Ingredient
SET IsActive = 0
WHERE IngredientName LIKE '%Tomato%'
Listing 5-9

Updating Data as a Data Set

This query accomplishes two goals. It lets you update more than one record in a given query. This T-SQL code also uses the data set for all ingredients that contain the word tomato to determine which records should be updated.

This is not the limit to using data sets for updating multiple records. I have had many situations where I need to update hundreds or thousands of records. While I could write a query to identify each record individually and then manually pull those IDs to update all those records individually or in a set. There are easier ways to update those multiple records and minimize the risk of errors due to human intervention. The query in Listing 5-10 shows you can update data while joining tables together.
UPDATE ing
SET ing.DateModified = GETDATE()
FROM dbo.Ingredient ing
      INNER JOIN dbo.RecipeIngredient recing
      ON ing.IngredientID = recing.IngredientID
      INNER JOIN dbo.Recipe rec
      ON recing.RecipeID = rec.RecipeID
WHERE rec.RecipeName = 'Spaghetti'
Listing 5-10

Updating Data with Joins to Create a Data Set

When using joins during your updates, I would recommend that you confirm what data you are updating. This can be done by converting the update statement to a select statement as shown in Listing 5-11.
SELECT ing.IngredientID,
      ing.IngredientName,
      ing.IsActive,
      ing.DateCreated,
      ing.DateModified
FROM dbo.Ingredient ing
      INNER JOIN dbo.RecipeIngredient recing
      ON ing.IngredientID = recing.IngredientID
      INNER JOIN dbo.Recipe rec
      ON recing.RecipeID = rec.RecipeID
WHERE rec.RecipeName = 'Spaghetti'
Listing 5-11

Verifying the Data Set with a Select Statement

You can check the records and verify the record count. Before performing the update, I suggest wrapping the UPDATE statement in a BEGIN TRAN… ROLLBACK to verify the record count.

Just like working with updates, you can also use data sets when deleting data. I have found that deleting data using data sets can be significantly more efficient, but it can also be somewhat risky if the data is not verified prior to deleting.

One of the largest temptations overall in writing T-SQL code is not taking advantage of data sets. Often, this is a result of being more comfortable writing procedural code or not having developed the ability to think in terms of data sets. My goal throughout this next section is to guide you through various ways to write code that embraces using data sets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.151.126