Managing Large Tables and Databases
Very large tables (where you have row counts in the tens of millions) have special needs. All data manipulations and maintenance operations need special considerations. This chapter will deal with features in SQL Server that can help; specifically, I’ll cover how partitioning a table can ease data movements and how the use of filegroups and data compression can help you improve performance by distributing data I/O across multiple drives and having a smaller amount of data to store on disk.
Table partitioning provides you with a built-in method of horizontally partitioning data within a table or index while still maintaining a single logical object. Horizontal partitioning involves keeping the same number of columns in each partition but reducing the number of rows. Partitioning can ease the management of very large tables or indexes; data can be loaded into a partitioned table in seconds instead of minutes or hours; query performance can be improved; and you can perform maintenance operations more quickly, allowing for smaller maintenance windows. You can also improve performance by enabling lock escalation so as to lock at the partition level before locking at the table level. The recipes in this chapter will demonstrate how to use Transact-SQL commands to create, modify, and manage partitions and to partition database objects.
This chapter will also cover filegroup placement. Database data files belong to filegroups. Every database has a primary filegroup, and you can add additional filegroups as needed. The addition of new filegroups to a database is often used for very large databases (VLDBs) because filegroups can ease backup administration and potentially improve performance by distributing data over multiple arrays. Data compression is used to put more data in a given amount of space, reducing disk I/O at the cost of increased CPU usage in performing the compression and decompression to work with the data.
Note Files and filegroups are covered in detail in the “Files, Filegroups, and Integrity” chapter.
The recipes in this chapter will be utilizing your company’s database, MegaCorpData. The database and additional files will be created on your C: drive, in a folder named Apress. The database is created from the following script:
USE master;
GO
EXECUTE xp_create_subdir 'C:Apress';
IF DB_ID('MegaCorpData') IS NOT NULL DROP DATABASE MegaCorpData;
GO
CREATE DATABASE MegaCorpData
ON PRIMARY
(NAME = 'MegaCorpData',
FILENAME = 'C:ApressMegaCorpData.MDF',
SIZE = 3MB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1MB)
LOG ON
(NAME = 'MegaCorpData_Log',
FILENAME = 'C:ApressMegaCorpData.LDF',
SIZE = 3MB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1MB);
GO
Note Table partitioning is a Developer or Enterprise edition (or higher) feature.
15-1. Partitioning a Table
Problem
You are adding a table (dbo.WebSiteHits) to your company’s database (MegaCorpData) to track each hit to your company’s web site. This table is expected to grow very large, very quickly. Because of its potential size, you are concerned that queries will not perform very well and that database backups may take longer than what your maintenance window allows for.
Solution
Partition the table into multiple filegroups, with each filegroup having its files on a different disk.
How It Works
The first step is to create multiple filegroups:
ALTER DATABASE MegaCorpData ADD FILEGROUP hitfg1;
ALTER DATABASE MegaCorpData ADD FILEGROUP hitfg2;
ALTER DATABASE MegaCorpData ADD FILEGROUP hitfg3;
ALTER DATABASE MegaCorpData ADD FILEGROUP hitfg4;
The next step is to add files to each filegroup:
ALTER DATABASE MegaCorpData
ADD FILE (NAME = mchitfg1,
FILENAME = 'C:Apressmc_hitfg1.ndf',
SIZE = 1MB)
TO FILEGROUP hitfg1;
ALTER DATABASE MegaCorpData
ADD FILE (NAME = mchitfg2,
FILENAME = 'C:Apressmc_hitfg2.ndf',
SIZE = 1MB)
TO FILEGROUP hitfg2;
ALTER DATABASE MegaCorpData
ADD FILE (NAME = mchitfg3,
FILENAME = 'C:Apressmc_hitfg3.ndf',
SIZE = 1MB)
TO FILEGROUP hitfg3;
ALTER DATABASE MegaCorpData
ADD FILE (NAME = mchitfg4,
FILENAME = 'C:Apressmc_hitfg4.ndf',
SIZE = 1MB)
TO FILEGROUP hitfg4;
Now that we have filegroups with files ready to receive data, we need to create a partition function, which will determine how the table will have its data horizontally partitioned by mapping rows to partitions based upon the value of a specified column:
USE MegaCorpData;
GO
CREATE PARTITION FUNCTION HitsDateRange (datetime)
AS RANGE LEFT FOR VALUES ('2006-01-01T00:00:00', '2007-01-01T00:00:00', '2008-01-01T00:00:00'),
The partition function specifies the name of the function, the data type, whether the range of boundaries is bound to the left or right (in this example, left was used), and the values that define the data in each boundary. You cannot specify a data type of text, ntext, image, xml, timestamp, varchar(max), varbinary(max), or nvarchar(max), nor can you use alias data types or CLR-defined data types. The number of values that you choose amounts to a total of n + 1 partitions. You can have up to 15,000 partitions, so you can specify up to 14,999 boundaries. If the values are not specified in order, the database engine sorts the values, creates the function, and returns a warning that the values were not provided in order. If there are any duplicate values, the database engine returns an error. The first partition contains values less than the lowest specified value, and the last partition contains values higher than the highest specified value. RANGE LEFT is used to specify that the upper boundary of each partition is the value specified; RANGE RIGHT is used to specify that the upper boundary of each partition is less than the specified value. In this case, we are specifying the first day of each year, creating yearly partitions. If you wanted to partition the data by month, you would just include values for the first of each month. Tables 15-1 and 15-2 show how the partition boundaries for the previous values are set for the specified dates.
Table 15-1. RANGE LEFT Boundaries
Partition # |
Values |
---|---|
1 |
<= '2006-01-01' |
2 |
> '2006-01-01' and <= '2007-01-01' |
3 |
> '2007-01-01' and <= '2008-01-01' |
4 |
> '2008-01-01' |
Table 15-2. RANGE RIGHT Boundaries
Partition # |
Values |
---|---|
1 |
< '2006-01-01' |
2 |
>= '2006-01-01' and < '2007-01-01' |
3 |
>= '2007-01-01' and < '2008-01-01' |
4 |
>= '2008-01-01' |
Once a partition function is created, it can be used in one or more partition schemes. A partition scheme maps the partitions defined in a partition function to actual filegroups. For example:
CREATE PARTITION SCHEME HitDateRangeScheme
AS PARTITION HitsDateRange
TO (hitfg1, hitfg2, hitfg3, hitfg4);
In this statement, you assign a name to the partition scheme and specify what partition function the scheme is bound to and which filegroups are assigned to each partition.
Now that all of the preliminary work is done, the new partitioned table can be built:
CREATE TABLE dbo.WebSiteHits (
WebSiteHitID BIGINT NOT NULL IDENTITY(1, 1),
WebSitePage VARCHAR(255) NOT NULL,
HitDate DATETIME NOT NULL,
CONSTRAINT PK_WebSiteHits PRIMARY KEY CLUSTERED (WebSiteHitId, HitDate)
)
ON [HitDateRangeScheme] (HitDate);
There are a couple of items to note about this CREATE TABLE statement. The first is the ON clause; it specifies which partition scheme to put the table on. The second item is the PRIMARY KEY constraint definition; while the primary key is unique with just the identity column (unless you deliberately add duplicate values to that column), the partitioning column has been added to it. This is because all unique indexes, including those that are automatically built from PRIMARY KEY and UNIQUE constraints, need to have the partitioning column included in the index key.
15-2. Locating Data in a Partition
Problem
You want to ensure that data is being stored in the expected partitions.
Solution
Utilize the $PARTITION function to return the partition that a row is stored in:
INSERT dbo.WebSiteHits (WebSitePage, HitDate)
VALUES ('Home Page', '2007-10-22T00:00:00'),
('Home Page', '2006-10-02T00:00:00'),
('Sales Page', '2008-05-09T00:00:00'),
('Sales Page', '2000-03-04T00:00:00'),
SELECT WebSitePage,
HitDate,
$PARTITION.HitsDateRange (HitDate) AS [Partition]
FROM dbo.WebSiteHits;
This query returns the following result set:
WebSitePage HitDate Partition
----------- ----------------------- -----------
Sales Page 2000-03-04 00:00:00.000 1
Home Page 2006-10-02 00:00:00.000 2
Home Page 2007-10-22 00:00:00.000 3
Sales Page 2008-05-09 00:00:00.000 4
How It Works
This example starts by inserting four rows into the table. Based on the dates inserted, each row should be in a separate partition. Next, a query is run to select the data from the table, and the query utilizes the $PARTITION function to return which partition the data is in. The syntax of the $PARTITION function is as follows:
$PARTITION.partition_function_name(expression)
where partition_function_name is the name of the partition function used to partition the table, and expression is the name of the partitioning column.
The $PARTITION function evaluates each HitDate and determines which partition said data is stored in based on the partition function. This allows you to see how your data is stored and how it is distributed across the different partitions. If one partition has an uneven distribution, you can explore creating new partitions or removing existing partitions, both of which are demonstrated in the upcoming recipes.
15-3. Adding a Partition
Problem
You’re into the last year that your partition scheme covers, so you need to add partitions.
Solution
Utilize the ALTER PARTITION SCHEME and ALTER PARTITION FUNCTION statements to extend the partition onto a new or existing filegroup and to create the new partition. For example:
ALTER PARTITION SCHEME HitDateRangeScheme NEXT USED [PRIMARY];
GO
ALTER PARTITION FUNCTION HitsDateRange () SPLIT RANGE ('2009-01-01T00:00:00'),
GO
How It Works
This example starts by using the ALTER PARTITION SCHEME statement to designate the next partition filegroup to use. The syntax for ALTER PARTITION SCHEME is as follows:
ALTER PARTITION SCHEME partition_scheme_name NEXT USED [ filegroup_name ]
where partition_scheme_name is the name of the partition scheme to modify. NEXT USED [filegroup_name] queues the specified filegroup to be used next by the next new partition created with an ALTER PARTITION FUNCTION statement.
In a given partition scheme, you can have only one filegroup that is designated NEXT USED. The filegroup does not need to be empty to be used.
In this example, we are specifying that the PRIMARY filegroup will be the filegroup that the next partition is placed on.
Next, the example uses the ALTER PARTITION FUNCTION statement to create (split) the new partition by splitting the partition boundaries. The syntax for ALTER PARTITION FUNCTION is as follows:
ALTER PARTITION FUNCTION partition_function_name() {
SPLIT RANGE ( boundary_value ) | MERGE RANGE ( boundary_value ) }
where partition_function_name is the name of the partition function to add or remove a partition from. SPLIT RANGE is used to create a new partition by defining a new boundary value; MERGE RANGE is used to remove an existing partition at the specified boundary and to move any existing records to another partition.
The existing partition is split, using the original boundary type of LEFT or RIGHT. You can split only one partition at a time. After this split, the partition layout now looks like Table 15-3.
Table 15-3. New RANGE LEFT Boundaries
Partition # |
Values |
---|---|
1 |
<= '2006-01-01' |
2 |
> '2006-01-01' and <= '2007-01-01' |
3 |
> '2007-01-01' and <= '2008-01-01' |
4 |
> '2008-01-01' and <= '2009-01-01' |
5 |
> '2009-01-01' |
Once the new partition is created, any new row added that qualifies to go to the new partition will be stored in that partition.
INSERT dbo.WebSiteHits
(WebSitePage, HitDate)
VALUES ('Sales Page', '2009-03-04T00:00:00'),
SELECT WebSitePage,
HitDate,
$PARTITION.HitsDateRange (HitDate) AS [Partition]
FROM dbo.WebSiteHits;
This query returns the following result set:
WebSitePage HitDate Partition
----------- ----------------------- -----------
Sales Page 2000-03-04 00:00:00.000 1
Home Page 2006-10-02 00:00:00.000 2
Home Page 2007-10-22 00:00:00.000 3
Sales Page 2008-05-09 00:00:00.000 4
Sales Page 2009-03-04 00:00:00.000 5
15-4. Removing a Partition
Problem
You need to remove a partition and move the data in that partition into another partition.
Solution
Utilize the ALTER PARTITION FUNCTION statement to remove a partition and merge the data in that partition into another partition. For example:
ALTER PARTITION FUNCTION HitsDateRange () MERGE RANGE ('2007-01-01T00:00:00'),
GO
SELECT WebSitePage,
HitDate,
$PARTITION.HitsDateRange(HitDate) Partition
FROM dbo.WebSiteHits;
This query returns the following result set:
WebSitePage HitDate Partition
----------- ----------------------- -----------
Sales Page 2000-03-04 00:00:00.000 1
Home Page 2007-10-22 00:00:00.000 2
Home Page 2006-10-02 00:00:00.000 2
Sales Page 2008-05-09 00:00:00.000 3
Sales Page 2009-03-04 00:00:00.000 4
How It Works
Recipe 15-3 showed the syntax for the ALTER PARTITION FUNCTION statement, including a description of the MERGE RANGE functionality that is used to remove an existing partition. Removing a partition merges the specified partition with the preceding partition, with the rows being moved into the new partition.
In this example, the partition with the boundary '2007-01-01' is removed. When the table is queried, you can see that the row in the year 2007 has been moved from the third partition to the second partition.
Table 15-4 shows the new partition layout.
Table 15-4. New RANGE LEFT Boundaries
Partition # |
Values |
---|---|
1 |
<= '2006-01-01' |
2 |
> '2006-01-01' and <= '2008-01-01' |
3 |
> '2008-01-01' and <= '2009-01-01' |
4 |
> '2009-01-01' |
15-5. Determining Whether a Table Is Partitioned
Problem
You need to determine whether a table is partitioned.
Solution
Query the sys.partitions system view to determine the partitions on an object. For example:
SELECT p.partition_id,
p.object_id,
p.partition_number
FROM sys.partitions AS p
WHERE p.partition_id IS NOT NULL
AND p.object_id = OBJECT_ID('dbo.WebSiteHits'),
This query returns the following result set:
partition_id object_id partition_number
-------------------- ----------- ----------------
72057594039042048 245575913 1
72057594039173120 245575913 2
72057594039238656 245575913 4
72057594039304192 245575913 3
Note The partition_id and object_id values will be different on your system.
How It Works
The system view sys.partitions contains a row for each partition of a table as well as for most types of indexes. (All tables contain at least one partition, whether they are specifically partitioned or not.)
15-6. Determining the Boundary Values for a Partitioned Table
Problem
You want to determine what the existing boundaries are for a partition function.
Solution
Query the system views to obtain this information. For example:
SELECT t.name AS TableName,
i.name AS IndexName,
p.partition_number AS [Part#],
f.type_desc,
CASE WHEN f.boundary_value_on_right = 1 THEN 'RIGHT' ELSE 'LEFT' END AS BoundaryType,
r.boundary_id,
r.value AS BoundaryValue
FROM sys.tables AS t
JOIN sys.indexes AS i
ON t.object_id = i.object_id
JOIN sys.partitions AS p
ON i.object_id = p.object_id
AND i.index_id = p.index_id
JOIN sys.partition_schemes AS s
ON i.data_space_id = s.data_space_id
JOIN sys.partition_functions AS f
ON s.function_id = f.function_id
LEFT JOIN sys.partition_range_values AS r
ON f.function_id = r.function_id
AND r.boundary_id = p.partition_number
WHERE t.object_id = OBJECT_ID('dbo.WebSiteHits')
AND i.type <= 1
ORDER BY p.partition_number;
This query returns the following result set:
TableName IndexName Part# type_desc BoundaryType boundary_id BoundaryValue
----------- -------------- ------ --------- ------------ ----------- -----------------------
WebSiteHits PK_WebSiteHits 1 RANGE LEFT 1 2006-01-01 00:00:00.000
WebSiteHits PK_WebSiteHits 2 RANGE LEFT 2 2008-01-01 00:00:00.000
WebSiteHits PK_WebSiteHits 3 RANGE LEFT 3 2009-01-01 00:00:00.000
WebSiteHits PK_WebSiteHits 4 RANGE LEFT NULL NULL
How It Works
The sys.partition_range_values system view contains the information about boundary values for a partition function. Join to the other system views to return more information, such as the table, index, partition number, and type of boundary.
15-7. Determining the Partitioning Column for a Partitioned Table
Problem
You need to determine which column is the partitioning column on a partitioned table.
Solution
Query the system views to obtain the partitioning column for a table. For example:
SELECT t.object_id AS Object_ID,
t.name AS TableName,
ic.column_id AS PartitioningColumnID,
c.name AS PartitioningColumnName
FROM sys.tables AS t
JOIN sys.indexes AS i
ON t.object_id = i.object_id
JOIN sys.partition_schemes AS ps
ON ps.data_space_id = i.data_space_id
JOIN sys.index_columns AS ic
ON ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.partition_ordinal > 0
JOIN sys.columns AS c
ON t.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE t.object_id = OBJECT_ID('dbo.WebSiteHits')
AND i.type <= 1;
This query returns the following result set:
Object_ID TableName PartitioningColumnID PartitioningColumnName
----------- ----------- -------------------- ----------------------
773577794 WebSiteHits 3 HitDate
How It Works
The system views sys.partition_schemes and sys.index_columns can be joined together and, with other system views, can be used to determine which column is the partitioning column.
15-8. Determining the NEXT USED Partition
Problem
When splitting a partition (as shown in Recipe 15-3), you started off by specifying the NEXT USED partition. However, there was a problem during the split, and you need to determine which partition is currently set to be used next.
Solution
Query the system views to determine the NEXT USED partition:
SELECT PartitionSchemaName,
NextUsedPartition = FileGroupName
FROM (SELECT FileGroupName = FG.name,
PartitionSchemaName = PS.name,
RANK() OVER (PARTITION BY PS.name ORDER BY DestDS.destination_id) AS dest_rank
FROM sys.partition_schemes PS
JOIN sys.destination_data_spaces AS DestDS
ON DestDS.partition_scheme_id = PS.data_space_id
JOIN sys.filegroups AS FG
ON FG.data_space_id = DestDS.data_space_id
LEFT JOIN sys.partition_range_values AS PRV
ON PRV.boundary_id = DestDS.destination_id
AND PRV.function_id = PS.function_id
WHERE PRV.value IS NULL
) AS a
WHERE dest_rank = 2;
How It Works
When NEXT USED is specified, there will be two partitions listed in the sys. destination_data_spaces view that are not included in sys.partition_range_values (there will always be one that represents the infinity range). If there is a second one present, it represents the partition that is set to be NEXT USED. We can get the NEXT USED partition for each partition where a NEXT USED has been specified by using joins. This is achieved by first performing a JOIN to sys.partition_schemes (to acquire the partition scheme name) and to sys.destination_data_spaces and sys.filegroups (to get the filegroup name). Next, a LEFT JOIN is performed to sys.partition_range_values to get the second occurrence (via the RANK function) of its NULL value (signifying no match on the join).
15-9. Moving a Partition to a Different Partitioned Table
Problem
You want to move the older data in your partitioned table to a history table.
Solution
Utilize the ALTER TABLE statement to move partitions between tables. For example:
CREATE TABLE dbo.WebSiteHitsHistory
(
WebSiteHitID BIGINT NOT NULL IDENTITY,
WebSitePage VARCHAR(255) NOT NULL,
HitDate DATETIME NOT NULL,
CONSTRAINT PK_WebSiteHitsHistory PRIMARY KEY (WebSiteHitID, HitDate)
)
ON [HitDateRangeScheme](HitDate);
GO
ALTER TABLE dbo.WebSiteHits SWITCH PARTITION 1 TO dbo.WebSiteHitsHistory PARTITION 1;
GO
SELECT WebSitePage,
HitDate,
$PARTITION.HitsDateRange(HitDate) Partition
FROM dbo.WebSiteHits;
SELECT WebSitePage,
HitDate,
$PARTITION.HitsDateRange(HitDate) Partition
FROM dbo.WebSiteHitsHistory;
These queries return the following result sets:
WebSitePage HitDate Partition
----------- ----------------------- -----------
Home Page 2007-10-22 00:00:00.000 2
Home Page 2006-10-02 00:00:00.000 2
Sales Page 2008-05-09 00:00:00.000 3
Sales Page 2009-03-04 00:00:00.000 4
WebSitePage HitDate Partition
----------- ----------------------- -----------
Sales Page 2000-03-04 00:00:00.000 1
How It Works
With SQL Server’s partitioning functionality, you can transfer partitions between different tables with a minimum of effort or overhead. Partitions are transferred between tables using the ALTER TABLE...SWITCH statement. Transfers can take place in three ways: switching a partition from one partitioned table to another partitioned table (both tables need to be partitioned on the same column), transferring an entire table from a nonpartitioned table to a partitioned table, or moving a partition from a partitioned table to a nonpartitioned table. The basic syntax of the ALTER TABLE statement used to switch partitions is as follows:
ALTER TABLE [ schema_name. ] tablename
SWITCH [ PARTITION source_partition_number_expression ]
TO [ schema_name. ] target_table
[ PARTITION target_partition_number_expression ]
Table 15-5 details the arguments of this command.
Table 15-5. ALTER TABLE...SWITCH Arguments
Argument |
Description |
---|---|
[ schema_name. ] tablename |
The source table to move the partition from |
source_partition_number_expression |
The partition number being relocated |
[ schema_name. ] target_table |
The target table to receive the partition |
partition.target_partition_number_expression |
The destination partition number |
This example starts by creating a history table (WebSiteHitsHistory). Next, the ALTER TABLE statement is used to move partition 1 from the WebSiteHits table to partition 1 of the WebSiteHitsHistory table. Finally, both tables are queried to show the data that is in each table and which partition the data is in.
Moving partitions between tables is much faster than performing a manual row operation (INSERT...SELECT, for example) because you aren’t actually moving physical data. Instead, you are only changing the metadata regarding which table the partition is currently associated with. Also, keep in mind that the target partition of any existing table needs to be empty so as to accommodate the incoming partition. If it is a nonpartitioned table, the table must be empty.
15-10. Moving Data from a Nonpartitioned Table to a Partition in a Partitioned Table
Problem
You have just found the long-lost spreadsheet that the original web site designer saved the web hits into. You have loaded this data into a table, and you want to add it to your WebSiteHits table.
Solution
Utilize the ALTER TABLE statement to move the data from the nonpartitioned table to an empty partition in the partitioned table. For example:
IF OBJECT_ID('dbo.WebSiteHitsImport','U') IS NOT NULL DROP TABLE dbo.WebSiteHitsImport;
GO
CREATE TABLE dbo.WebSiteHitsImport
(
WebSiteHitID BIGINT NOT NULL IDENTITY,
WebSitePage VARCHAR(255) NOT NULL,
HitDate DATETIME NOT NULL,
CONSTRAINT PK_WebSiteHitsImport PRIMARY KEY (WebSiteHitID, HitDate),
CONSTRAINT CK_WebSiteHitsImport CHECK (HitDate <= '2006-01-01T00:00:00')
)
ON hitfg1;
GO
INSERT INTO dbo.WebSiteHitsImport (WebSitePage, HitDate)
VALUES ('Sales Page', '2005-06-01T00:00:00'),
('Main Page', '2005-06-01T00:00:00'),
GO
-- partition 1 is empty – move data to this partition
ALTER TABLE dbo.WebSiteHitsImport SWITCH TO dbo.WebSiteHits PARTITION 1;
GO
-- see the data
SELECT WebSiteHitId,
WebSitePage,
HitDate,
$PARTITION.HitsDateRange(HitDate) Partition
FROM dbo.WebSiteHits;
SELECT WebSiteHitId,
WebSitePage,
HitDate,
$PARTITION.HitsDateRange(HitDate) Partition
FROM dbo.WebSiteHitsImport;
These queries return the following result sets:
WebSiteHitId WebSitePage HitDate Partition
-------------------- ----------- ----------------------- -----------
1 Sales Page 2005-06-01 00:00:00.000 1
2 Main Page 2005-06-01 00:00:00.000 1
1 Home Page 2007-10-22 00:00:00.000 2
2 Home Page 2006-10-02 00:00:00.000 2
3 Sales Page 2008-05-09 00:00:00.000 3
5 Sales Page 2009-03-04 00:00:00.000 4
WebSiteHitId WebSitePage HitDate Partition
-------------------- ----------- ----------------------- -----------
How It Works
In this example, we first create a new, nonpartitioned table that the imported data will be loaded into and then insert some records into that table. Next, the ALTER TABLE statement is utilized to move the data from the new, nonpartitioned table into an empty partition in the partitioned table. Finally, SELECT statements are run against the two tables to show where the data is within those tables. Since the source table is not partitioned, the partition number on the source table is not specified in the ALTER TABLE statement.
To move the data from one table to the partitioned table, the table whose data is being moved must be in the same filegroup as the partition that the data is to be moved into for the partitioned table. Additionally, the table whose data is being moved must have the same structure (columns, indexes, constraints) as the partitioned table, and it must have an additional check constraint that enforces that the data in the partitioned column has the same allowable values as the corresponding partition on the partitioned table. Finally, the partition on the partitioned table that the data is being moved to must be empty. Since this is a metadata operation (assigning the existing data pages from one table to another), it makes sense that the data must exist in the same filegroup as the partition and that the partition is empty; otherwise, data would need to be moved through INSERT...SELECT statements.
Caution In this example, both tables have an identity column. If you look at the returned results, there are duplicate values for this identity column. Since the unique constraints include the partitioning column values in addition to the identity column values, these values are valid even though duplicated identity column values are not normally seen.
15-11. Moving a Partition from a Partitioned Table to a Nonpartitioned Table
Problem
You want to move all of the data in a partition of a partitioned table to a nonpartitioned table.
Solution
Utilize the ALTER TABLE statement to move the data from a partition of a partitioned table to a nonpartitioned table. For example:
ALTER TABLE dbo.WebSiteHits SWITCH PARTITION 1 TO dbo.WebSiteHitsImport;
GO
-- see the data
SELECT WebSiteHitId,
WebSitePage,
HitDate,
$PARTITION.HitsDateRange(HitDate) Partition
FROM dbo.WebSiteHits;
SELECT WebSiteHitId,
WebSitePage,
HitDate,
$PARTITION.HitsDateRange(HitDate) Partition
FROM dbo.WebSiteHitsImport;
These queries return the following result sets:
WebSiteHitId WebSitePage HitDate Partition
------------ ----------- ----------------------- -----------
1 Home Page 2007-10-22 00:00:00.000 2
2 Home Page 2006-10-02 00:00:00.000 2
3 Sales Page 2008-05-09 00:00:00.000 3
5 Sales Page 2009-03-04 00:00:00.000 4
WebSiteHitId WebSitePage HitDate Partition
------------ ----------- ----------------------- -----------
1 Sales Page 2005-06-01 00:00:00.000 1
2 Main Page 2005-06-01 00:00:00.000 1
How It Works
In this example, the ALTER TABLE statement is utilized to move the data from a partition of the partitioned table to an empty, nonpartitioned table. Next, SELECT statements are run against the two tables to show where the data is within those tables. Since the destination table is not partitioned, the partition number on the destination table is not specified in the ALTER TABLE statement.
To move the data from one partition of a partitioned table to the nonpartitioned table, the nonpartitioned table must be in the same filegroup as the partition from which the data is to be removed, and the nonpartitioned table must be empty. Additionally, the nonpartitioned table must have the same structure (columns, indexes, constraints) as the partitioned table. In the prior recipe, the nonpartitioned column required an additional check constraint; this additional check constraint is not necessary when moving data into a nonpartitioned table. However, if you plan on moving the data back into the partitioned table, it is a good idea to add a check constraint when you create the unpartitioned table to ensure that data that would violate the partition does not get inserted into this table,
15-12. Reducing Table Locks on Partitioned Tables
Problem
Your partitioned table is incurring an excessive number of table locks, and you want to reduce them as much as you can.
Solution
Change the lock escalation of the table so as to lock at the partition level instead of at the table level.
ALTER TABLE dbo.WebSiteHits SET (LOCK_ESCALATION = AUTO);
How It Works
Locks on a table normally go from row to table. If a query is performing all of its activity in one partition of a partitioned table, it can be beneficial to change this behavior on the partitioned table to escalate from the row to the partition. This is performed utilizing the ALTER TABLE statement, as shown earlier.
Caution If queries that are locking different partitions need to expand their locks to other partitions, it is possible that this could increase the potential for deadlocks.
Note See the “Transactions, Locking, Blocking, Deadlocking” chapter for more information about lock escalation.
15-13. Removing Partition Functions and Schemes
Problem
You are no longer using a specific partition function or scheme, and you want to remove it from the database.
Solution
Utilize the DROP PARTITION SCHEME and DROP PARTITION FUNCTION statements to drop the partition scheme and function. For example:
DROP TABLE dbo.WebSiteHits;
DROP TABLE dbo.WebSiteHitsHistory;
DROP PARTITION SCHEME HitDateRangeScheme;
DROP PARTITION FUNCTION HitsDateRange;
GO
How It Works
Dropping a partition scheme or function requires that they are no longer bound to a table. In this example, we removed their usage by dropping the test tables that were utilizing the partition function and scheme. If you don’t want to lose this data, you should copy this data to another table. If your goal is to simply have all of the data in one partition, you can merge all of the partitions while keeping the partition scheme and function. (A partitioned table with a single partition is functionally equivalent to a nonpartitioned table.)
If you had originally created the table without any clustered indexes, you can use the CREATE INDEX DROP EXISTING option to rebuild the index without the partition scheme reference.
To remove the partition scheme, you utilize the DROP PARTITION SCHEME statement, specifying the name of the partition scheme to drop. To remove the partition function, you utilize the DROP PARTITION FUNCTION statement, specifying the name of the partition function to drop.
15-14. Easing VLDB Manageability (with Filegroups)
Problem
You have a very large database (VLDB) with some very large tables. You want to minimize the performance impact of these tables on the rest of the database.
Solution
Place the large tables on specific filegroups that are placed on different disks than the rest of the database.
How It Works
Filegroups are often used for very large databases because they can ease backup administration and potentially improve performance by distributing data over disk LUNs or arrays. When creating a table, you can specify that it be created on a specific filegroup. For example, if you have a table that you know will become very large, you can designate that it be created on a specific filegroup that can accommodate it.
The basic syntax for designating a table’s filegroup is as follows:
CREATE TABLE ...
[ ON {filegroup | "default" }] [ { TEXTIMAGE_ON { filegroup | "default" } ]
Table 15-6 details the arguments of this command.
Table 15-6. Arguments for Creating a Table on a Filegroup
Argument |
Description |
---|---|
filegroup |
This specifies the name of the filegroup on which the table will be created. |
"DEFAULT" |
This sets the table to be created on the default filegroup defined for the database. |
TEXTIMAGE_ON { filegroup | "DEFAULT" } |
This option stores in a separate filegroup the data from text, ntext, image, xml, varchar(max), nvarchar(max), and varbinary(max) data types. |
Recipe 15-1 demonstrated how to create additional filegroups in a database, and Recipe 15-9 demonstrated how to create a table on a specific filegroup.
Problem
You want to reduce the amount of disk space required for storing data in a table.
Solution
Utilize row or page data compression.
How It Works
Two forms of compression are available in SQL Server for tables, indexes, and filegroups: row-level and page-level compression.
Row-level compression applies variable-length storage to numeric data types (for example, int, bigint, and decimal) and fixed-length types such as money and datetime. Row-level compression also applies variable-length format to fixed-character strings and doesn’t store trailing blank characters or NULL and 0 values.
Page-level compression includes row-level compression and also adds prefix and dictionary compression. Prefix compression involves the storage of column prefix values that are stored multiple times in a column across rows and replaces the redundant prefixes with references to the single value. Dictionary compression occurs after prefix compression and involves finding repeated data values anywhere on the data page (not just prefixes) and then replacing the redundancies with a pointer to the single value.
To enable compression on a new table being created, utilize the DATA_COMPRESSION option in the CREATE TABLE statement and select either NONE, ROW, or PAGE.
CREATE TABLE dbo.DataCompressionTest
(
JobPostinglD INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
CandidatelD INT NOT NULL,
JobDESC CHAR(2000) NOT NULL
)
WITH (DATA_COMPRESSION = ROW);
GO
The following example creates a table and inserts 100,000 rows into this table consisting of a random integer in one column and a string consisting of 50 a characters. (The GO command, followed by a number, repeats that batch the specified number of times.)
CREATE TABLE dbo.ArchiveJobPosting
(
JobPostinglD INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
CandidatelD INT NOT NULL,
JobDESC CHAR(2000) NOT NULL
);
GO
INSERT dbo.ArchiveJobPosting
(CandidatelD,
JobDESC)
VALUES (CAST(RAND() * 10 AS INT),
REPLICATE('a', 50))
GO 100000
The sp_estimate_data_compression_savings system-stored procedure estimates the amount of disk savings if enabling row- or page-level compression. The stored procedure takes five arguments: the schema name of the table to be compressed, object name, index ID, partition number, and data-level compression method (NONE, ROW, or PAGE). The following example checks to see how much space can be saved by using row-level compression:
EXECUTE sp_estimate_data_compression_savings @schema_name = 'dbo', @object_name = 'ArchiveJobPosting', @index_id = NULL, @partition_number = NULL, @data_compression = 'ROW';
This returns the following information (results pivoted for readability):
object_name ArchiveJobPosting
schema_name dbo
index_id 1
partition_number 1
size_with_current_compression_setting(KB) 200752
size_with_requested_compression_setting(KB) 7344
sample_size_with_current_compression_setting(KB) 40656
sample_size_with_requested_compression_setting(KB) 1488
Note You may receive different results on your system.
As you can see from the stored procedure results, adding row-level compression would save more than 193,000 KB with the current data set. The sample size data is based on the stored procedure loading sample data into a cloned table in tempdb and validating the compression ratio accordingly.
The following example tests to see whether there are benefits to using page-level compression:
EXECUTE sp_estimate_data_compression_savings @schema_name = 'dbo', @object_name = 'ArchiveJobPosting', @index_id = NULL, @partition_number = NULL, @data_compression = 'PAGE';
This returns the following:
object_name ArchiveJobPosting
schema_name dbo
index_id 1
partition_number 1
size_with_current_compression_setting(KB) 200752
size_with_requested_compression_setting(KB) 1984
sample_size_with_current_compression_setting(KB) 40392
sample_size_with_requested_compression_setting(KB) 400
Note You may receive different results on your system.
Sure enough, the page-level compression shows additional benefits beyond row-level compression.
To turn page-level compression on for the table, execute the following statement:
ALTER TABLE dbo.ArchiveJobPosting REBUILD WITH (DATA_COMPRESSION = PAGE);
Data compression can also be configured at the partition level. In the next set of commands, a new partitioning function and scheme are created and applied to a new table. The table will use varying compression levels based on the partition.
CREATE PARTITION FUNCTION pfn_ArchivePart(int)
AS RANGE LEFT FOR VALUES (50000, 100000, 150000);
GO
CREATE PARTITION SCHEME psc_ArchivePart
AS PARTITION pfn_ArchivePart
TO (hitfg1, hitfg2, hitfg3, hitfg4);
GO
CREATE TABLE dbo.ArchiveJobPosting_V2
(
JobPostingID INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
CandidateID INT NOT NULL,
JobDesc CHAR(2000) NOT NULL
)
ON psc_ArchivePart(JobPostingID)
WITH (
DATA_COMPRESSION = PAGE ON PARTITIONS (1 TO 3),
DATA_COMPRESSION = ROW ON PARTITIONS (4));
GO
The partitions you want to apply a data compression type to can be specified as a single partition number, a range of partitions with the starting and ending partitions separated by the TO keyword, or as a comma-delimited list of partition numbers and ranges. All of these partition options can be used at the same time.
If you want to change the compression level for any of the partitions, utilize the ALTER TABLE statement. This example changes partition 4 from row-level to page-level compression:
ALTER TABLE dbo.ArchiveJobPosting_V2
REBUILD PARTITION = 4
WITH (DATA_COMPRESSION = PAGE);
GO
15-16. Rebuilding a Heap
Problem
You have a heap (a table without a clustered index) that has become severely fragmented, and you want to reduce both the fragmentation and the number of forwarded records in the table.
Solution
Utilize the REBUILD option of the ALTER TABLE statement to rebuild a heap. For example:
CREATE TABLE dbo.HeapTest
(
HeapTest VARCHAR(1000)
);
GO
INSERT INTO dbo.HeapTest (HeapTest)
VALUES ('Test'),
GO 10000
SELECT index_type_desc,
fragment_count,
page_count,
forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED')
WHERE object_id = OBJECT_ID('dbo.HeapTest'),
GO
UPDATE dbo.HeapTest
SET HeapTest = REPLICATE('Test',250);
GO
SELECT index_type_desc,
fragment_count,
page_count,
forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED')
WHERE object_id = OBJECT_ID(' dbo.HeapTest'),
GO
ALTER TABLE dbo.HeapTest REBUILD;
GO
SELECT index_type_desc,
fragment_count,
page_count,
forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED')
WHERE object_id = OBJECT_ID(' dbo.HeapTest'),
GO
These queries return the following result sets:
index_type_desc fragment_count page_count forwarded_record_count
--------------- -------------------- -------------------- ----------------------
HEAP 4 23 0
index_type_desc fragment_count page_count forwarded_record_count
--------------- -------------------- -------------------- ----------------------
HEAP 5 1442 9934
index_type_desc fragment_count page_count forwarded_record_count
--------------- -------------------- -------------------- ----------------------
HEAP 4 1430 0
Note The fragment_count value will differ on your system and will even change if you run this recipe multiple times.
How It Works
In this example, a table is created with a single VARCHAR(1000) column, and 10,000 rows are added to this table with the value Test. An UPDATE statement is then run, which expands the data in this column to be Test repeated 250 times, for a total length of 1,000, which completely fills up the column. When the data is initially populated with the INSERT statement, the data pages are filled with as many rows as can fit. When the UPDATE statement is run, most of these rows have to move to other pages because fewer rows can fit onto a page. When rows are moved on a heap, a forwarding record is left in the place of the original row, causing an even further increased need for data pages.
During this process, the physical index statistics are being computed. From the results, it is obvious that the UPDATE statement causes a massive growth in the number of pages required to hold the data and in the number of forwarded records. After the table is rebuilt, the table now uses fewer pages, and the table no longer has any forwarded records.
3.21.43.192