Cursor Overhead

In this chapter’s introduction, I talked about the benefits that set-based solutions have over cursor-based ones. I mentioned both logical and performance benefits. For the most part, efficiently written set-based solutions will outperform cursor-based solutions for two reasons.

First, you empower the optimizer to do what it’s so good at–generating multiple valid execution plans, and choosing the most efficient one. When you apply a cursor-based solution, you’re basically forcing the optimizer to go with a rigid plan that doesn’t leave much room for optimization–at least not as much room as with set-based solutions.

Second, row-by-row manipulation creates a lot of overhead. You can run some simple tests to witness and measure this overhead–for example, just scanning a table with a simple query and comparing the results to scanning it with a cursor. To compare apples to apples, make sure you’re scanning the same amount of data as you did with the cursor-based query. You can eliminate the actual disk I/O cost by running the code twice. (The first run will load the data to cache.) To eliminate the time it takes to generate the output, you should run your code with the Discard Results After Execution option in SQL Server Management Studio (SSMS) turned on. The difference in performance between the set-based code and the cursor code will then be the cursor’s overhead.

I will now demonstrate how to compare scanning the same amount of data with set-based code versus with a cursor. Run the following code to generate a table called T1, with a million rows, each containing slightly more than 200 bytes:

SET NOCOUNT ON;
USE tempdb;
GO
IF OBJECT_ID('dbo.T1') IS NOT NULL
  DROP TABLE dbo.T1;
GO
SELECT n AS keycol, CAST('a' AS CHAR(200)) AS filler
INTO dbo.T1
FROM dbo.Nums;

CREATE UNIQUE CLUSTERED INDEX idx_keycol ON dbo.T1(keycol);

You can find the code to create and populate the Nums table in Chapter 1.

Turn on the Discard Results After Execution option in SSMS (under Tools|Options|Query Results|SQL Server|Results to Grid or Results to Text). Now clear the cache:

DBCC DROPCLEANBUFFERS;

Run the following set-based code twice–the first run will measure performance against cold cache, and the second will measure it against warm cache:

SELECT keycol, filler FROM dbo.T1;

On my system, this query ran for 4 seconds against cold cache and 2 seconds against warm cache. Clear the cache again, and then run the cursor code twice:

DECLARE @keycol AS INT, @filler AS CHAR(200);
DECLARE C CURSOR FAST_FORWARD FOR SELECT keycol, filler FROM dbo.T1;
OPEN C
FETCH NEXT FROM C INTO @keycol, @filler;
WHILE @@fetch_status = 0
BEGIN
  -- Process data here
  FETCH NEXT FROM C INTO @keycol, @filler;
END
CLOSE C;
DEALLOCATE C;

This code ran for 22 seconds against cold cache and 20 seconds against warm cache. Considering the warm cache example, in which there’s no physical I/O involved, the cursor code ran ten times more slowly than the set-based code, and notice that I used the fastest cursor you can get–FAST_FORWARD. Both solutions scanned the same amount of data. Besides the performance overhead, you also have the development and maintenance overhead of your code. This is a very basic example involving little code; in production environments with more complex code, the problem is, of course, much worse.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.191.86