Chapter 30. Reusing space in a table

Joe Webb

People who have switched to Microsoft SQL Server from Microsoft Access sometimes ask, “If I delete a bunch of rows, do I need to compact my SQL Server database?”

It’s been many years since I’ve used Access, but I still remember the reason for their concern. Access would continually add rows to the end of the table. If some, or even all, of the rows were deleted from the table, Access wouldn’t reuse the space. It kept adding rows to the end of the table and never backfilled the holes. Compacting the Access database file would get rid of the holes.

Understanding how SQL Server automatically reuses table space

I’m not an expert in Access, and I’m certainly not knocking it. I haven’t even used Access since v2.0 back in the 1990s. This behavior may have changed since then, or perhaps I misremember, but suffice it to say that with SQL Server this is not an issue. But don’t take my word for it. Let’s consider an example to prove the point.

To set up the example, let’s create a table with three columns and then populate it with test data. The T-SQL code for doing this is shown in listing 1.

Listing 1. Creating and populating the dbo.Test table
USE tempdb ;
GO

--create a test table
CREATE TABLE dbo.Test
(
col1 INT
,col2 CHAR(25)
,col3 VARCHAR(4000)
) ;

--create some test data
DECLARE @cnt INT ;
SET @cnt = 0 ;
WHILE @cnt < 1000
BEGIN
SELECT
@cnt = @cnt + 1 ;
INSERT
dbo.Test ( col1,col2,col3 )
VALUES (
@cnt
,'test row # ' + CAST(@cnt AS VARCHAR(10)) + 'A'
,REPLICATE('ABCD', ROUND(RAND() * @cnt, 0))
) ;
END

The CREATE TABLE statement creates a table called Test as part of the dbo schema in the tempdb database. The table is composed of three columns. The first is an integer, the second is a fixed length character string that contains 25 characters, and the third and final column is a variable length character string that can contain up to 1900 characters.

To add test data, we will use a simple INSERT statement. The INSERT statement has been placed inside a WHILE loop and is executed 1,000 times. Note that the last column in the table will vary in length from zero to a maximum of 4,000. Statistically, it should average around 1,000 characters.

Let’s view the contents of the table to make sure we have what we think we have. We can do this by running a SELECT statement to retrieve the data, as shown in listing 2. The results of the query are shown in figure 1.

Figure 1. Results from dbo.Test table

Listing 2. Querying the dbo.Test table
--view the table
SELECT
*
FROM
dbo.Test ;

To examine the amount of space the table consumes, we’ll use a Dynamic Management View (DMV) called sys.dm_db_index_physical_stats. DMVs were first introduced in SQL Server 2005 and have been continued in SQL Server 2008. (For more on DMVs, see chapter 29, “My favorite DMVs, and why,” by Aaron Bertrand.)


Note

The scripts that make use of DMVs will not work in SQL Server 2000.


The query in listing 3 returns a single row of data with four columns—the allocation unit type, the page count, the average page space used as a percentage, and the total number of rows in the dbo.Test table. The alloc_unit_type_desc column describes the allocation unit type: valid values are IN_ROW_DATA, LOB_DATA, or ROW_OVERFLOW_DATA (see Books Online for a detailed explanation of these terms). The page_count column indicates the total number of data pages used by the table for IN_ROW_DATA (that is, the allocation unit that stores the table rows in our example). The avg_page_space_used_in_percent column reports the average percentage of space used in all data pages in the IN_ROW_DATA allocation type. The record_count intuitively contains the number of rows contained in the table.

Listing 3. Examining the space used by the dbo.Test table
--check the size of the table
SELECT
alloc_unit_type_desc
,page_count
,avg_page_space_used_in_percent
,record_count
FROM
sys.dm_db_index_physical_stats(
DB_ID()
,OBJECT_ID(N'dbo.Test')
,NULL
,NULL
,'Detailed') ;

Figure 2 displays the results of the DMV query. As you can see on my test system, 158 data pages are used to store the 1,000 rows and each data page is, on average, 82.1 percent full.

Figure 2. Using a DMV to review space used

To continue with the example, let’s delete half of the rows in our dbo.Test table and see what happens to the space used by the table. The T-SQL script in listing 4 uses the modulo operator, represented by the percent sign in T-SQL, to delete each row where the value in the first column, col1, is an odd number. So we are deleting every other row in the table.

Listing 4. Deleting the odd-numbered rows
--delete the odd rows
DELETE FROM
Test
WHERE
col1 % 2 = 1

--view the table
SELECT
*
FROM
dbo.Test ;

Figure 3 shows the results of the SELECT query. You can see that the table is left with test row #2, test row #4, and so on, for a total of 500 rows.

Figure 3. Deleting half the rows in the dbo.Test table

Now that half of the rows in the table have been deleted, let’s look at the space the table consumes by running the DMV query from listing 3 again.

Figure 4 shows that the number of pages used to store the table’s data remains constant but the percentage used of each page changes. It is cut in half, from 82.1 percent to 42.1 percent. Also notice that the number of rows reported in the record_count column has been cut in half, as expected.

Figure 4. Examining the space consumed by the dbo.Test table

So let’s add some new rows to the table to prove that SQL Server will automatically reuse the newly freed space, thus filling in the holes, to go back to our Access comparison. Using the T-SQL script found in listing 5, we can quickly add 500 rows of data to the dbo.Test table. This script is similar to the one first used to populate the table with sample data. It inserts one row at a time in a WHILE loop. To help differentiate the new rows from the existing rows, I change the insert statement so that the third column is filled with WXYZ’s rather than ABCD’s as before, though it doesn’t matter for our proof.

Listing 5. Adding new rows to the dbo.Test table
--add some more test data
DECLARE @cnt INT ;
SET @cnt = 0 ;
WHILE @cnt < 500
BEGIN
SELECT
@cnt = @cnt + 1 ;
INSERT
dbo.Test ( col1,col2,col3 )
VALUES (
@cnt
,'test row # ' + CAST(@cnt AS VARCHAR(10))
,REPLICATE('WXYZ', ROUND(RAND() * @cnt, 0))
) ;
END

Now let’s once again use the sys.dm_db_index_physical_stats DMV to see the space used by the table. Figure 5 displays the result of the query.

Figure 5. Reviewing the space used by the dbo.Test table after inserting new rows

Notice that the table is still using 158 data pages, but the average percent used for each page has increased to 62.9 percent from the prior value of 42.1 percent. The results confirm our expectations—the holes were indeed reused by SQL Server. No additional pages were allocated; the free space available was used.

At this point it is important to understand that, although SQL Server can and will automatically reuse space that was previously used by one or more rows, it will do so only under very specific circumstances. This behavior is most readily demonstrated through the use of a heap (a table without a clustered index) in which there is no predefined order of the rows. The behavior can also be observed in clustered tables when the newly inserted rows have key values that allow them to be inserted into the holes left by prior deletions. You shouldn’t expect all newly inserted rows to fit nicely in the holes of a clustered table. The point is that SQL Server will reuse space as appropriate.

To clean up after this example, let’s execute one final statement to drop the dbo.Test table. Listing 6 displays the DROP statement.

Listing 6. Dropping the dbo.Test table
--clean up
DROP TABLE dbo.Test ;

Recognizing when SQL Server does not reclaim space

Under certain circumstances SQL Server does not automatically reclaim space that is no longer being used. If a table definition is altered to drop one or more variable length columns, the space consumed by those columns is not immediately made available for reuse by SQL Server.

To illustrate this behavior, let’s consider an example. Let’s create another test table using the script in listing 7. The script creates the table and populates it with 1,000 rows of data.

Listing 7. Creating the dbo.Test2 table
USE tempdb ;
GO

--create the dbo.Test2 table
CREATE TABLE dbo.Test2
(
col1 INT
,col2 CHAR(25)
,col3 VARCHAR(4000)
) ;

--create some test data
DECLARE @cnt INT ;
SET @cnt = 0 ;
WHILE @cnt < 1000
BEGIN
SELECT
@cnt = @cnt + 1 ;
INSERT
dbo.Test2 ( col1,col2,col3)
VALUES (
@cnt
,'test row # ' + CAST(@cnt AS VARCHAR(10))
,REPLICATE('A', 4000)
) ;
END

Figure 6 shows the results from the SELECT statement. This table has three columns of data, an integer in the first column, a character string in the second column that can contain up to 25 characters, and a variable length character string in the final column that contains 4,000 characters.

Figure 6. Viewing data in the dbo.Test2 table

Using the query shown in listing 3, we can see how much space our newly created dbo.Test2 table is consuming. Figure 7 shows the results. The newly created table takes up 500 data pages to store the 1,000 rows of data. Each data page is, on average, 99.9 percent full.

Figure 7. Space used by the dbo.Test2 table

Now to set up our test scenario, let’s drop the third column, the one that consumes the most space. Listing 8 contains the ALTER TABLE script to drop col3; it then executes the DMV query to reveal the space used by the table.

Listing 8. Dropping a varchar column in the dbo.Test table
--drop the last column
ALTER TABLE dbo.Test2 DROP COLUMN col3 ;

--check the space used again
SELECT
alloc_unit_type_desc
,page_count
,avg_page_space_used_in_percent
,record_count
FROM
sys.dm_db_index_physical_stats(
DB_ID()
,OBJECT_ID(N'dbo.Test2')
,NULL
,NULL
,'Detailed') ;

Looking at the results in figure 8, we can see that we get the same results as before—500 data pages, storing 1,000 rows, each 99.9 percent full, and this after dropping the column that consumed the most space.

Figure 8. Reviewing the space used after dropping a column

Why is this? When a table is altered to drop a column, SQL Server does not remove the column data from the data pages. Instead it updates the metadata in the system tables so that when queried, it appears as if the column no longer exists. The data is still present in the data pages, but it’s not returned as a part of a result set. Thus, the space cannot be reused initially.

So let’s add some additional rows to the table and see what happens. The script shown in listing 9 inserts 500 additional rows into the dbo.Test2 table. Notice that this script inserts only two columns of data because we’ve dropped the third column, col3.

Listing 9. Adding more rows to the dbo.Test2 table
--insert additional rows
DECLARE @cnt INT ;
SET @cnt = 0 ;
WHILE @cnt < 500
BEGIN
SELECT
@cnt = @cnt + 1 ;
INSERT
dbo.Test2 ( col1,col2 )
VALUES (
@cnt
,'test row # ' + CAST(@cnt AS VARCHAR(10))
) ;
END

When we use the sys.dm_db_index_physical_stats DMV to examine the space used by the dbo.Test2 table, we find that the number of data pages increases slightly, indicating that the space once consumed by the dropped column was not automatically reused. Figure 9 shows the results.

Figure 9. Inserting rows after dropping a column

If the space had been automatically reused, there should have been more than enough space to contain the additional 500 rows without having to add more space. But because the columns were only marked as no longer being part of the table in the metadata, there is no space to reuse. So SQL Server added additional pages to make space for the new rows.

Using DBCC CLEANTABLE to reclaim unused table space

Although it is comforting to know that in certain cases SQL Server will automatically reuse space once consumed by deleted rows, we’ve just seen that it will not automatically reuse space once consumed by dropped columns.

Fortunately, this space is not lost forever; we can reclaim this space by issuing a DBCC command. The DBCC CLEANTABLE command allows us to specify a database and table, and it will free up any space once consumed by dropped variable length character columns.

To reclaim the space in our dbo.Test2 table, run the T-SQL command found in listing 10.

Listing 10. Reclaiming space using DBCC CLEANTABLE
--reclaim the space from the table
DBCC CLEANTABLE('tempdb', 'dbo.Test2') ;

If it succeeds, you should receive a message similar to the following as an output message in the query window.

DBCC execution completed. If DBCC printed error messages, contact your system
administrator.

DBCC CLEANTABLE reclaims space from variable length columns that no longer exist as part of the table definition. In this context a variable length column can be one of the following data types: varchar, nvarchar, varchar(max), nvarchar(max), varbinary, varbinary(max), text, ntext, image, sql_variant, and xml.

Now that we’ve used the DBCC command to reclaim the space, let’s return to our DMV query and examine the space used by the table by running listing 3.

Figure 10 shows the results. Notice that the number of data pages consumed by the table did not decrease; however, the average space used as a percentage decreased dramatically to 1.4 percent from 99.7 percent.

Figure 10. The space consumed by dbo.Test2 after running DBCC CLEANTABLE

To prove that the space is truly available for reuse, let’s add another 3,000 rows to the table by altering and running listing 9.

Now checking the used space by again running the DMV query in listing 3, we see that the number of pages used to make the table did not increase. Notice, though, that the average page space used as a percentage did increase to 4.2 percent from 1.4 percent. Figure 11 shows the results.

Figure 11. Inserting rows into reclaimed space

Summary

When rows are deleted from a table without a clustered index (an object known as a heap), SQL Server can readily reuse the space that was once consumed by deleted rows. The same may hold true for clustered tables, as long as the newly inserted rows have the required key values that would allow them be placed in a hole created by a deleted row.

There are certain circumstances, in which deleted space is not immediately released for reuse; in particular when variable length character columns are dropped from a table. By employing the DBCC CLEANTABLE command we can reclaim the space once consumed by the dropped columns and make better use of the disk resources at hand.

About the author

Joe Webb, a Microsoft SQL Server MVP, serves as Chief Operating Manager for WebbTech Solutions, a Nashville-based IT consulting company. He has over 15 years of industry experience and has consulted extensively with companies in the areas of business process analysis and improvements, database design and architecture, software development, and technical training.

In addition to helping his consulting clients, Joe enjoys writing and speaking at technical conferences. He has delivered over 50 sessions at conferences in Europe and North America and has authored two other books.

Joe served for six years on the Board of Directors for the Professional Association for SQL Server (PASS), an international user group with 30,000 members worldwide. He culminated his tenure on the board by serving as the Executive Vice President of Finance for the organization. Joe also volunteers his time by serving on the MBA Advisory Board for Auburn University and the Computer Science Advisory Committee for Nashville State Community College.

When he’s not consulting, Joe enjoys spending time with his family and tending to the animals and garden on his small farm in middle Tennessee. He can be reached at [email protected].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.27.58