Chapter 15: Managing Large Tables and Databases

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 15

Managing Large Tables and Databases

by Wayne Sheffield

Very large tables (where you have row counts in the tens of millions) have special needs. All data manipulations and maintenance operations need special considerations. This chapter will deal with features in SQL Server that can help; specifically, I’ll cover how partitioning a table can ease data movements and how the use of filegroups and data compression can help you improve performance by distributing data I/O across multiple drives and having a smaller amount of data to store on disk.

Table partitioning provides you with a built-in method of horizontally partitioning data within a table or index while still maintaining a single logical object. Horizontal partitioning involves keeping the same number of columns in each partition but reducing the number of rows. Partitioning can ease the management of very large tables or indexes; data can be loaded into a partitioned table in seconds instead of minutes or hours; query performance can be improved; and you can perform maintenance operations more quickly, allowing for smaller maintenance windows. You can also improve performance by enabling lock escalation so as to lock at the partition level before locking at the table level. The recipes in this chapter will demonstrate how to use Transact-SQL commands to create, modify, and manage partitions and to partition database objects.

This chapter will also cover filegroup placement. Database data files belong to filegroups. Every database has a primary filegroup, and you can add additional filegroups as needed. The addition of new filegroups to a database is often used for very large databases (VLDBs) because filegroups can ease backup administration and potentially improve performance by distributing data over multiple arrays. Data compression is used to put more data in a given amount of space, reducing disk I/O at the cost of increased CPU usage in performing the compression and decompression to work with the data.

Note Files and filegroups are covered in detail in the “Files, Filegroups, and Integrity” chapter.

The recipes in this chapter will be utilizing your company’s database, MegaCorpData . The database and additional files will be created on your C: drive, in a folder named Apress. The database is created from the following script:

USE master;
GO

EXECUTE xp_create_subdir 'C:Apress';

IF DB_ID('MegaCorpData') IS NOT NULL DROP DATABASE MegaCorpData;
GO

CREATE DATABASE MegaCorpData
ON PRIMARY
(NAME = 'MegaCorpData',
 FILENAME = 'C:ApressMegaCorpData.MDF',
 SIZE = 3MB,
 MAXSIZE = UNLIMITED,
 FILEGROWTH = 1MB)
LOG ON
(NAME = 'MegaCorpData_Log',
 FILENAME = 'C:ApressMegaCorpData.LDF',
 SIZE = 3MB,
 MAXSIZE = UNLIMITED,
 FILEGROWTH = 1MB); 
GO

Note Table partitioning is a Developer or Enterprise edition (or higher) feature.

15-1. Partitioning a Table

Problem

You are adding a table (dbo.WebSiteHits) to your company’s database (MegaCorpData) to track each hit to your company’s web site. This table is expected to grow very large, very quickly. Because of its potential size, you are concerned that queries will not perform very well and that database backups may take longer than what your maintenance window allows for.

Solution

Partition the table into multiple filegroups, with each filegroup having its files on a different disk.

How It Works

The first step is to create multiple filegroups:

ALTER DATABASE MegaCorpData ADD FILEGROUP hitfg1;
ALTER DATABASE MegaCorpData ADD FILEGROUP hitfg2;
ALTER DATABASE MegaCorpData ADD FILEGROUP hitfg3;
ALTER DATABASE MegaCorpData ADD FILEGROUP hitfg4;

The next step is to add files to each filegroup:

ALTER DATABASE MegaCorpData
ADD FILE (NAME = mchitfg1,
          FILENAME = 'C:Apressmc_hitfg1.ndf',
          SIZE = 1MB)
TO FILEGROUP hitfg1;
ALTER DATABASE MegaCorpData
ADD FILE (NAME = mchitfg2,
          FILENAME = 'C:Apressmc_hitfg2.ndf',
          SIZE = 1MB)
TO FILEGROUP hitfg2;
ALTER DATABASE MegaCorpData
ADD FILE (NAME = mchitfg3,
          FILENAME = 'C:Apressmc_hitfg3.ndf',
          SIZE = 1MB)
TO FILEGROUP hitfg3;
ALTER DATABASE MegaCorpData
ADD FILE (NAME = mchitfg4,
          FILENAME = 'C:Apressmc_hitfg4.ndf',
          SIZE = 1MB)
TO FILEGROUP hitfg4;

Now that we have filegroups with files ready to receive data, we need to create a partition function, which will determine how the table will have its data horizontally partitioned by mapping rows to partitions based upon the value of a specified column:

USE MegaCorpData;
GO
CREATE PARTITION FUNCTION HitsDateRange (datetime)
AS RANGE LEFT FOR VALUES ('2006-01-01T00:00:00', '2007-01-01T00:00:00', '2008-01-01T00:00:00'),

The partition function specifies the name of the function, the data type, whether the range of boundaries is bound to the left or right (in this example, left was used), and the values that define the data in each boundary. You cannot specify a data type of text, ntext, image, xml, timestamp, varchar(max), varbinary(max), or nvarchar(max), nor can you use alias data types or CLR-defined data types. The number of values that you choose amounts to a total of n + 1 partitions. You can have up to 15,000 partitions, so you can specify up to 14,999 boundaries. If the values are not specified in order, the database engine sorts the values, creates the function, and returns a warning that the values were not provided in order. If there are any duplicate values, the database engine returns an error. The first partition contains values less than the lowest specified value, and the last partition contains values higher than the highest specified value. RANGE LEFT is used to specify that the upper boundary of each partition is the value specified; RANGE RIGHT is used to specify that the upper boundary of each partition is less than the specified value. In this case, we are specifying the first day of each year, creating yearly partitions. If you wanted to partition the data by month, you would just include values for the first of each month. Tables 15-1 and 15-2 show how the partition boundaries for the previous values are set for the specified dates.

Table 15-1. RANGE LEFT Boundaries

Partition #	Values
1	<= '2006-01-01'
2	> '2006-01-01' and <= '2007-01-01'
3	> '2007-01-01' and <= '2008-01-01'
4	> '2008-01-01'

Table 15-2. RANGE RIGHT Boundaries

Partition #	Values
1	< '2006-01-01'
2	>= '2006-01-01' and < '2007-01-01'
3	>= '2007-01-01' and < '2008-01-01'
4	>= '2008-01-01'

Once a partition function is created, it can be used in one or more partition schemes. A partition scheme maps the partitions defined in a partition function to actual filegroups. For example:

CREATE PARTITION SCHEME HitDateRangeScheme
AS PARTITION HitsDateRange
TO (hitfg1, hitfg2, hitfg3, hitfg4);

In this statement, you assign a name to the partition scheme and specify what partition function the scheme is bound to and which filegroups are assigned to each partition.

Now that all of the preliminary work is done, the new partitioned table can be built:

CREATE TABLE dbo.WebSiteHits (
    WebSiteHitID BIGINT NOT NULL IDENTITY(1, 1),
    WebSitePage VARCHAR(255) NOT NULL,
    HitDate DATETIME NOT NULL,
    CONSTRAINT PK_WebSiteHits PRIMARY KEY CLUSTERED (WebSiteHitId, HitDate)
)
ON [HitDateRangeScheme] (HitDate);

There are a couple of items to note about this CREATE TABLE statement. The first is the ON clause; it specifies which partition scheme to put the table on. The second item is the PRIMARY KEY constraint definition; while the primary key is unique with just the identity column (unless you deliberately add duplicate values to that column), the partitioning column has been added to it. This is because all unique indexes, including those that are automatically built from PRIMARY KEY and UNIQUE constraints, need to have the partitioning column included in the index key.

15-2. Locating Data in a Partition

Problem

You want to ensure that data is being stored in the expected partitions.

Solution

Utilize the $PARTITION function to return the partition that a row is stored in:

INSERT  dbo.WebSiteHits (WebSitePage, HitDate)
VALUES  ('Home Page', '2007-10-22T00:00:00'),
        ('Home Page', '2006-10-02T00:00:00'),
        ('Sales Page', '2008-05-09T00:00:00'),
        ('Sales Page', '2000-03-04T00:00:00'),

SELECT  WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange (HitDate) AS [Partition]
FROM    dbo.WebSiteHits;

This query returns the following result set:

WebSitePage HitDate                 Partition
----------- ----------------------- -----------
Sales Page  2000-03-04 00:00:00.000 1
Home Page   2006-10-02 00:00:00.000 2
Home Page   2007-10-22 00:00:00.000 3
Sales Page  2008-05-09 00:00:00.000 4

How It Works

This example starts by inserting four rows into the table. Based on the dates inserted, each row should be in a separate partition. Next, a query is run to select the data from the table, and the query utilizes the $PARTITION function to return which partition the data is in. The syntax of the $PARTITION function is as follows:

$PARTITION.partition_function_name(expression)

where partition_function_name is the name of the partition function used to partition the table, and expression is the name of the partitioning column.

The $PARTITION function evaluates each HitDate and determines which partition said data is stored in based on the partition function. This allows you to see how your data is stored and how it is distributed across the different partitions. If one partition has an uneven distribution, you can explore creating new partitions or removing existing partitions, both of which are demonstrated in the upcoming recipes.

15-3. Adding a Partition

Problem

You’re into the last year that your partition scheme covers, so you need to add partitions.

Solution

Utilize the ALTER PARTITION SCHEME and ALTER PARTITION FUNCTION statements to extend the partition onto a new or existing filegroup and to create the new partition. For example:

ALTER PARTITION SCHEME HitDateRangeScheme NEXT USED [PRIMARY];
GO

ALTER PARTITION FUNCTION HitsDateRange () SPLIT RANGE ('2009-01-01T00:00:00'),
GO

How It Works

This example starts by using the ALTER PARTITION SCHEME statement to designate the next partition filegroup to use. The syntax for ALTER PARTITION SCHEME is as follows:

ALTER PARTITION SCHEME partition_scheme_name NEXT USED [ filegroup_name ]

where partition_scheme_name is the name of the partition scheme to modify. NEXT USED [filegroup_name] queues the specified filegroup to be used next by the next new partition created with an ALTER PARTITION FUNCTION statement.

In a given partition scheme, you can have only one filegroup that is designated NEXT USED. The filegroup does not need to be empty to be used.

In this example, we are specifying that the PRIMARY filegroup will be the filegroup that the next partition is placed on.

Next, the example uses the ALTER PARTITION FUNCTION statement to create (split) the new partition by splitting the partition boundaries. The syntax for ALTER PARTITION FUNCTION is as follows:

ALTER PARTITION FUNCTION partition_function_name() {
SPLIT RANGE ( boundary_value ) | MERGE RANGE ( boundary_value ) }

where partition_function_name is the name of the partition function to add or remove a partition from. SPLIT RANGE is used to create a new partition by defining a new boundary value; MERGE RANGE is used to remove an existing partition at the specified boundary and to move any existing records to another partition.

The existing partition is split, using the original boundary type of LEFT or RIGHT. You can split only one partition at a time. After this split, the partition layout now looks like Table 15-3.

Table 15-3. New RANGE LEFT Boundaries

Partition #	Values
1	<= '2006-01-01'
2	> '2006-01-01' and <= '2007-01-01'
3	> '2007-01-01' and <= '2008-01-01'
4	> '2008-01-01' and <= '2009-01-01'
5	> '2009-01-01'

Once the new partition is created, any new row added that qualifies to go to the new partition will be stored in that partition.

INSERT  dbo.WebSiteHits
        (WebSitePage, HitDate)
VALUES  ('Sales Page', '2009-03-04T00:00:00'),

SELECT  WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange (HitDate) AS [Partition]
FROM    dbo.WebSiteHits;

This query returns the following result set:

WebSitePage HitDate                 Partition
----------- ----------------------- -----------
Sales Page  2000-03-04 00:00:00.000 1
Home Page   2006-10-02 00:00:00.000 2
Home Page   2007-10-22 00:00:00.000 3
Sales Page  2008-05-09 00:00:00.000 4
Sales Page  2009-03-04 00:00:00.000 5

15-4. Removing a Partition

Problem

You need to remove a partition and move the data in that partition into another partition.

Solution

Utilize the ALTER PARTITION FUNCTION statement to remove a partition and merge the data in that partition into another partition. For example:

ALTER PARTITION FUNCTION HitsDateRange () MERGE RANGE ('2007-01-01T00:00:00'),
GO

SELECT  WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange(HitDate) Partition
FROM    dbo.WebSiteHits;

This query returns the following result set:

WebSitePage HitDate                 Partition
----------- ----------------------- -----------
Sales Page  2000-03-04 00:00:00.000 1
Home Page   2007-10-22 00:00:00.000 2
Home Page   2006-10-02 00:00:00.000 2
Sales Page  2008-05-09 00:00:00.000 3
Sales Page  2009-03-04 00:00:00.000 4

How It Works

Recipe 15-3 showed the syntax for the ALTER PARTITION FUNCTION statement, including a description of the MERGE RANGE functionality that is used to remove an existing partition. Removing a partition merges the specified partition with the preceding partition, with the rows being moved into the new partition.

In this example, the partition with the boundary '2007-01-01' is removed. When the table is queried, you can see that the row in the year 2007 has been moved from the third partition to the second partition.

Table 15-4 shows the new partition layout.

Table 15-4. New RANGE LEFT Boundaries

Partition #	Values
1	<= '2006-01-01'
2	> '2006-01-01' and <= '2008-01-01'
3	> '2008-01-01' and <= '2009-01-01'
4	> '2009-01-01'

15-5. Determining Whether a Table Is Partitioned

Problem

You need to determine whether a table is partitioned.

Solution

Query the sys.partitions system view to determine the partitions on an object. For example:

SELECT  p.partition_id,
        p.object_id,
        p.partition_number
FROM    sys.partitions AS p
WHERE   p.partition_id IS NOT NULL
AND     p.object_id = OBJECT_ID('dbo.WebSiteHits'),

This query returns the following result set:

partition_id         object_id   partition_number
-------------------- ----------- ----------------
72057594039042048    245575913   1
72057594039173120    245575913   2
72057594039238656    245575913   4
72057594039304192    245575913   3

Note The partition_id and object_id values will be different on your system.

How It Works

The system view sys.partitions contains a row for each partition of a table as well as for most types of indexes. (All tables contain at least one partition, whether they are specifically partitioned or not.)

15-6. Determining the Boundary Values for a Partitioned Table

Problem

You want to determine what the existing boundaries are for a partition function.

Solution

Query the system views to obtain this information. For example:

SELECT  t.name AS TableName,
        i.name AS IndexName,
        p.partition_number AS [Part#],
        f.type_desc,
        CASE WHEN f.boundary_value_on_right = 1 THEN 'RIGHT' ELSE 'LEFT' END AS BoundaryType,
        r.boundary_id,
        r.value AS BoundaryValue
FROM    sys.tables AS t
        JOIN sys.indexes AS i
            ON t.object_id = i.object_id
        JOIN sys.partitions AS p
            ON i.object_id = p.object_id
               AND i.index_id = p.index_id
        JOIN sys.partition_schemes AS s
            ON i.data_space_id = s.data_space_id
        JOIN sys.partition_functions AS f
            ON s.function_id = f.function_id
        LEFT JOIN sys.partition_range_values AS r
            ON f.function_id = r.function_id
               AND r.boundary_id = p.partition_number
WHERE   t.object_id = OBJECT_ID('dbo.WebSiteHits')
AND     i.type <= 1
ORDER BY p.partition_number;

This query returns the following result set:

TableName   IndexName      Part#  type_desc BoundaryType boundary_id BoundaryValue
----------- -------------- ------ --------- ------------ ----------- -----------------------
WebSiteHits PK_WebSiteHits 1      RANGE     LEFT         1           2006-01-01 00:00:00.000
WebSiteHits PK_WebSiteHits 2      RANGE     LEFT         2           2008-01-01 00:00:00.000
WebSiteHits PK_WebSiteHits 3      RANGE     LEFT         3           2009-01-01 00:00:00.000
WebSiteHits PK_WebSiteHits 4      RANGE     LEFT         NULL        NULL

How It Works

The sys.partition_range_values system view contains the information about boundary values for a partition function. Join to the other system views to return more information, such as the table, index, partition number, and type of boundary.

15-7. Determining the Partitioning Column for a Partitioned Table

Problem

You need to determine which column is the partitioning column on a partitioned table.

Solution

Query the system views to obtain the partitioning column for a table. For example:

SELECT  t.object_id AS Object_ID,
        t.name AS TableName,
        ic.column_id AS PartitioningColumnID,
        c.name AS PartitioningColumnName
FROM    sys.tables AS t
        JOIN sys.indexes AS i
            ON t.object_id = i.object_id
        JOIN sys.partition_schemes AS ps
            ON ps.data_space_id = i.data_space_id
        JOIN sys.index_columns AS ic
            ON ic.object_id = i.object_id
               AND ic.index_id = i.index_id
               AND ic.partition_ordinal > 0
        JOIN sys.columns AS c
            ON t.object_id = c.object_id
               AND ic.column_id = c.column_id
WHERE   t.object_id = OBJECT_ID('dbo.WebSiteHits')
AND     i.type <= 1;

This query returns the following result set:

Object_ID   TableName   PartitioningColumnID PartitioningColumnName
----------- ----------- -------------------- ----------------------
773577794   WebSiteHits 3                    HitDate

How It Works

The system views sys.partition_schemes and sys.index_columns can be joined together and, with other system views, can be used to determine which column is the partitioning column.

15-8. Determining the NEXT USED Partition

Problem

When splitting a partition (as shown in Recipe 15-3), you started off by specifying the NEXT USED partition. However, there was a problem during the split, and you need to determine which partition is currently set to be used next.

Solution

Query the system views to determine the NEXT USED partition:

SELECT  PartitionSchemaName,
        NextUsedPartition = FileGroupName
FROM    (SELECT FileGroupName = FG.name,
                PartitionSchemaName = PS.name,
                RANK() OVER (PARTITION BY PS.name ORDER BY DestDS.destination_id) AS dest_rank
         FROM   sys.partition_schemes PS
                JOIN sys.destination_data_spaces AS DestDS
                    ON DestDS.partition_scheme_id = PS.data_space_id
                JOIN sys.filegroups AS FG
                    ON FG.data_space_id = DestDS.data_space_id
                LEFT JOIN sys.partition_range_values AS PRV
                    ON PRV.boundary_id = DestDS.destination_id
                       AND PRV.function_id = PS.function_id
         WHERE  PRV.value IS NULL
        ) AS a
WHERE   dest_rank = 2;

How It Works

When NEXT USED is specified, there will be two partitions listed in the sys. destination_data_spaces view that are not included in sys.partition_range_values (there will always be one that represents the infinity range). If there is a second one present, it represents the partition that is set to be NEXT USED. We can get the NEXT USED partition for each partition where a NEXT USED has been specified by using joins. This is achieved by first performing a JOIN to sys.partition_schemes (to acquire the partition scheme name) and to sys.destination_data_spaces and sys.filegroups (to get the filegroup name). Next, a LEFT JOIN is performed to sys.partition_range_values to get the second occurrence (via the RANK function) of its NULL value (signifying no match on the join).

15-9. Moving a Partition to a Different Partitioned Table

Problem

You want to move the older data in your partitioned table to a history table.

Solution

Utilize the ALTER TABLE statement to move partitions between tables. For example:

CREATE TABLE dbo.WebSiteHitsHistory
       (
        WebSiteHitID BIGINT NOT NULL IDENTITY,
        WebSitePage VARCHAR(255) NOT NULL,
        HitDate DATETIME NOT NULL,
        CONSTRAINT PK_WebSiteHitsHistory PRIMARY KEY (WebSiteHitID, HitDate)
       )
ON     [HitDateRangeScheme](HitDate);
GO

ALTER TABLE dbo.WebSiteHits SWITCH PARTITION 1 TO dbo.WebSiteHitsHistory PARTITION 1;
GO

SELECT  WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange(HitDate) Partition
FROM    dbo.WebSiteHits;
SELECT  WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange(HitDate) Partition
FROM    dbo.WebSiteHitsHistory;

These queries return the following result sets:

WebSitePage HitDate                 Partition
----------- ----------------------- -----------
Home Page   2007-10-22 00:00:00.000 2
Home Page   2006-10-02 00:00:00.000 2
Sales Page  2008-05-09 00:00:00.000 3
Sales Page  2009-03-04 00:00:00.000 4

WebSitePage HitDate                 Partition
----------- ----------------------- -----------
Sales Page  2000-03-04 00:00:00.000 1

How It Works

With SQL Server’s partitioning functionality, you can transfer partitions between different tables with a minimum of effort or overhead. Partitions are transferred between tables using the ALTER TABLE...SWITCH statement. Transfers can take place in three ways: switching a partition from one partitioned table to another partitioned table (both tables need to be partitioned on the same column), transferring an entire table from a nonpartitioned table to a partitioned table, or moving a partition from a partitioned table to a nonpartitioned table. The basic syntax of the ALTER TABLE statement used to switch partitions is as follows:

ALTER TABLE [ schema_name. ] tablename
SWITCH [ PARTITION source_partition_number_expression ]
TO [ schema_name. ] target_table
[ PARTITION target_partition_number_expression ]

Table 15-5 details the arguments of this command.

Table 15-5. ALTER TABLE...SWITCH Arguments

Argument	Description
[ schema_name. ] tablename	The source table to move the partition from
source_partition_number_expression	The partition number being relocated
[ schema_name. ] target_table	The target table to receive the partition
partition.target_partition_number_expression	The destination partition number

This example starts by creating a history table (WebSiteHitsHistory). Next, the ALTER TABLE statement is used to move partition 1 from the WebSiteHits table to partition 1 of the WebSiteHitsHistory table. Finally, both tables are queried to show the data that is in each table and which partition the data is in.

Moving partitions between tables is much faster than performing a manual row operation (INSERT...SELECT, for example) because you aren’t actually moving physical data. Instead, you are only changing the metadata regarding which table the partition is currently associated with. Also, keep in mind that the target partition of any existing table needs to be empty so as to accommodate the incoming partition. If it is a nonpartitioned table, the table must be empty.

15-10. Moving Data from a Nonpartitioned Table to a Partition in a Partitioned Table

Problem

You have just found the long-lost spreadsheet that the original web site designer saved the web hits into. You have loaded this data into a table, and you want to add it to your WebSiteHits table.

Solution

Utilize the ALTER TABLE statement to move the data from the nonpartitioned table to an empty partition in the partitioned table. For example:

IF OBJECT_ID('dbo.WebSiteHitsImport','U') IS NOT NULL DROP TABLE dbo.WebSiteHitsImport;
GO
CREATE TABLE dbo.WebSiteHitsImport
       (
        WebSiteHitID BIGINT NOT NULL IDENTITY,
        WebSitePage VARCHAR(255) NOT NULL,
        HitDate DATETIME NOT NULL,
        CONSTRAINT PK_WebSiteHitsImport PRIMARY KEY (WebSiteHitID, HitDate),
        CONSTRAINT CK_WebSiteHitsImport CHECK (HitDate <= '2006-01-01T00:00:00')
       )
ON hitfg1;
GO
INSERT INTO dbo.WebSiteHitsImport (WebSitePage, HitDate)
VALUES ('Sales Page', '2005-06-01T00:00:00'),
       ('Main Page', '2005-06-01T00:00:00'),
GO

-- partition 1 is empty – move data to this partition
ALTER TABLE dbo.WebSiteHitsImport SWITCH TO dbo.WebSiteHits PARTITION 1;
GO

-- see the data
SELECT  WebSiteHitId,
        WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange(HitDate) Partition
FROM    dbo.WebSiteHits;
SELECT  WebSiteHitId,
        WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange(HitDate) Partition
FROM    dbo.WebSiteHitsImport;

These queries return the following result sets:

WebSiteHitId         WebSitePage HitDate                 Partition
-------------------- ----------- ----------------------- -----------
1                    Sales Page  2005-06-01 00:00:00.000 1
2                    Main Page   2005-06-01 00:00:00.000 1
1                    Home Page   2007-10-22 00:00:00.000 2
2                    Home Page   2006-10-02 00:00:00.000 2
3                    Sales Page  2008-05-09 00:00:00.000 3
5                    Sales Page  2009-03-04 00:00:00.000 4

WebSiteHitId         WebSitePage HitDate                 Partition
-------------------- ----------- ----------------------- -----------

How It Works

In this example, we first create a new, nonpartitioned table that the imported data will be loaded into and then insert some records into that table. Next, the ALTER TABLE statement is utilized to move the data from the new, nonpartitioned table into an empty partition in the partitioned table. Finally, SELECT statements are run against the two tables to show where the data is within those tables. Since the source table is not partitioned, the partition number on the source table is not specified in the ALTER TABLE statement.

To move the data from one table to the partitioned table, the table whose data is being moved must be in the same filegroup as the partition that the data is to be moved into for the partitioned table. Additionally, the table whose data is being moved must have the same structure (columns, indexes, constraints) as the partitioned table, and it must have an additional check constraint that enforces that the data in the partitioned column has the same allowable values as the corresponding partition on the partitioned table. Finally, the partition on the partitioned table that the data is being moved to must be empty. Since this is a metadata operation (assigning the existing data pages from one table to another), it makes sense that the data must exist in the same filegroup as the partition and that the partition is empty; otherwise, data would need to be moved through INSERT...SELECT statements.

Caution In this example, both tables have an identity column. If you look at the returned results, there are duplicate values for this identity column. Since the unique constraints include the partitioning column values in addition to the identity column values, these values are valid even though duplicated identity column values are not normally seen.

15-11. Moving a Partition from a Partitioned Table to a Nonpartitioned Table

Problem

You want to move all of the data in a partition of a partitioned table to a nonpartitioned table.

Solution

Utilize the ALTER TABLE statement to move the data from a partition of a partitioned table to a nonpartitioned table. For example:

ALTER TABLE dbo.WebSiteHits SWITCH PARTITION 1 TO dbo.WebSiteHitsImport;
GO

-- see the data
SELECT  WebSiteHitId,
        WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange(HitDate) Partition
FROM    dbo.WebSiteHits;
SELECT  WebSiteHitId,
        WebSitePage,
        HitDate,
        $PARTITION.HitsDateRange(HitDate) Partition
FROM    dbo.WebSiteHitsImport;

These queries return the following result sets:

WebSiteHitId WebSitePage HitDate                 Partition
------------ ----------- ----------------------- -----------
1            Home Page   2007-10-22 00:00:00.000 2
2            Home Page   2006-10-02 00:00:00.000 2
3            Sales Page  2008-05-09 00:00:00.000 3
5            Sales Page  2009-03-04 00:00:00.000 4

WebSiteHitId WebSitePage HitDate                 Partition
------------ ----------- ----------------------- -----------
1            Sales Page  2005-06-01 00:00:00.000 1
2            Main Page   2005-06-01 00:00:00.000 1

How It Works

In this example, the ALTER TABLE statement is utilized to move the data from a partition of the partitioned table to an empty, nonpartitioned table. Next, SELECT statements are run against the two tables to show where the data is within those tables. Since the destination table is not partitioned, the partition number on the destination table is not specified in the ALTER TABLE statement.

To move the data from one partition of a partitioned table to the nonpartitioned table, the nonpartitioned table must be in the same filegroup as the partition from which the data is to be removed, and the nonpartitioned table must be empty. Additionally, the nonpartitioned table must have the same structure (columns, indexes, constraints) as the partitioned table. In the prior recipe, the nonpartitioned column required an additional check constraint; this additional check constraint is not necessary when moving data into a nonpartitioned table. However, if you plan on moving the data back into the partitioned table, it is a good idea to add a check constraint when you create the unpartitioned table to ensure that data that would violate the partition does not get inserted into this table,

15-12. Reducing Table Locks on Partitioned Tables

Problem

Your partitioned table is incurring an excessive number of table locks, and you want to reduce them as much as you can.

Solution

Change the lock escalation of the table so as to lock at the partition level instead of at the table level.

ALTER TABLE dbo.WebSiteHits SET (LOCK_ESCALATION = AUTO);

How It Works

Locks on a table normally go from row to table. If a query is performing all of its activity in one partition of a partitioned table, it can be beneficial to change this behavior on the partitioned table to escalate from the row to the partition. This is performed utilizing the ALTER TABLE statement, as shown earlier.

Caution If queries that are locking different partitions need to expand their locks to other partitions, it is possible that this could increase the potential for deadlocks.

Note See the “Transactions, Locking, Blocking, Deadlocking” chapter for more information about lock escalation.

15-13. Removing Partition Functions and Schemes

Problem

You are no longer using a specific partition function or scheme, and you want to remove it from the database.

Solution

Utilize the DROP PARTITION SCHEME and DROP PARTITION FUNCTION statements to drop the partition scheme and function. For example:

DROP TABLE dbo.WebSiteHits;
DROP TABLE dbo.WebSiteHitsHistory;
DROP PARTITION SCHEME HitDateRangeScheme;
DROP PARTITION FUNCTION HitsDateRange;
GO

How It Works

Dropping a partition scheme or function requires that they are no longer bound to a table. In this example, we removed their usage by dropping the test tables that were utilizing the partition function and scheme. If you don’t want to lose this data, you should copy this data to another table. If your goal is to simply have all of the data in one partition, you can merge all of the partitions while keeping the partition scheme and function. (A partitioned table with a single partition is functionally equivalent to a nonpartitioned table.)

If you had originally created the table without any clustered indexes, you can use the CREATE INDEX DROP EXISTING option to rebuild the index without the partition scheme reference.

To remove the partition scheme, you utilize the DROP PARTITION SCHEME statement, specifying the name of the partition scheme to drop. To remove the partition function, you utilize the DROP PARTITION FUNCTION statement, specifying the name of the partition function to drop.

15-14. Easing VLDB Manageability (with Filegroups)

Problem

You have a very large database (VLDB) with some very large tables. You want to minimize the performance impact of these tables on the rest of the database.

Solution

Place the large tables on specific filegroups that are placed on different disks than the rest of the database.

How It Works

Filegroups are often used for very large databases because they can ease backup administration and potentially improve performance by distributing data over disk LUNs or arrays. When creating a table, you can specify that it be created on a specific filegroup. For example, if you have a table that you know will become very large, you can designate that it be created on a specific filegroup that can accommodate it.

The basic syntax for designating a table’s filegroup is as follows:

CREATE TABLE ...
[ ON {filegroup | "default" }] [ { TEXTIMAGE_ON { filegroup | "default" } ]

Table 15-6 details the arguments of this command.

Table 15-6. Arguments for Creating a Table on a Filegroup

Argument	Description
filegroup	This specifies the name of the filegroup on which the table will be created.
"DEFAULT"	This sets the table to be created on the default filegroup defined for the database.
TEXTIMAGE_ON { filegroup \| "DEFAULT" }	This option stores in a separate filegroup the data from text, ntext, image, xml, varchar(max), nvarchar(max), and varbinary(max) data types.

Recipe 15-1 demonstrated how to create additional filegroups in a database, and Recipe 15-9 demonstrated how to create a table on a specific filegroup.

15-15. Compressing Table Data

Problem

You want to reduce the amount of disk space required for storing data in a table.

Solution

Utilize row or page data compression.

How It Works

Two forms of compression are available in SQL Server for tables, indexes, and filegroups: row-level and page-level compression.

Row-level compression applies variable-length storage to numeric data types (for example, int, bigint, and decimal) and fixed-length types such as money and datetime. Row-level compression also applies variable-length format to fixed-character strings and doesn’t store trailing blank characters or NULL and 0 values.

Page-level compression includes row-level compression and also adds prefix and dictionary compression. Prefix compression involves the storage of column prefix values that are stored multiple times in a column across rows and replaces the redundant prefixes with references to the single value. Dictionary compression occurs after prefix compression and involves finding repeated data values anywhere on the data page (not just prefixes) and then replacing the redundancies with a pointer to the single value.

To enable compression on a new table being created, utilize the DATA_COMPRESSION option in the CREATE TABLE statement and select either NONE, ROW, or PAGE.

CREATE TABLE dbo.DataCompressionTest
       (
        JobPostinglD INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
        CandidatelD INT NOT NULL,
        JobDESC CHAR(2000) NOT NULL
       )
WITH (DATA_COMPRESSION = ROW);
GO

The following example creates a table and inserts 100,000 rows into this table consisting of a random integer in one column and a string consisting of 50 a characters. (The GO command, followed by a number, repeats that batch the specified number of times.)

CREATE TABLE dbo.ArchiveJobPosting
       (
        JobPostinglD INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
        CandidatelD INT NOT NULL,
        JobDESC CHAR(2000) NOT NULL
       );
GO

INSERT  dbo.ArchiveJobPosting
        (CandidatelD,
         JobDESC)
VALUES (CAST(RAND() * 10 AS INT),
        REPLICATE('a', 50))
GO 100000

The sp_estimate_data_compression_savings system-stored procedure estimates the amount of disk savings if enabling row- or page-level compression. The stored procedure takes five arguments: the schema name of the table to be compressed, object name, index ID, partition number, and data-level compression method (NONE, ROW, or PAGE). The following example checks to see how much space can be saved by using row-level compression:

EXECUTE sp_estimate_data_compression_savings @schema_name = 'dbo', @object_name = 'ArchiveJobPosting', @index_id = NULL, @partition_number = NULL, @data_compression = 'ROW';

This returns the following information (results pivoted for readability):

object_name                                             ArchiveJobPosting
schema_name                                             dbo
index_id                                                1
partition_number                                        1
size_with_current_compression_setting(KB)               200752
size_with_requested_compression_setting(KB)             7344
sample_size_with_current_compression_setting(KB)        40656
sample_size_with_requested_compression_setting(KB)      1488

Note You may receive different results on your system.

As you can see from the stored procedure results, adding row-level compression would save more than 193,000 KB with the current data set. The sample size data is based on the stored procedure loading sample data into a cloned table in tempdb and validating the compression ratio accordingly.

The following example tests to see whether there are benefits to using page-level compression:

EXECUTE sp_estimate_data_compression_savings @schema_name = 'dbo', @object_name = 'ArchiveJobPosting', @index_id = NULL, @partition_number = NULL, @data_compression = 'PAGE';

This returns the following:

object_name                                             ArchiveJobPosting
schema_name                                             dbo
index_id                                                1
partition_number                                        1
size_with_current_compression_setting(KB)               200752
size_with_requested_compression_setting(KB)             1984
sample_size_with_current_compression_setting(KB)        40392
sample_size_with_requested_compression_setting(KB)      400

Note You may receive different results on your system.

Sure enough, the page-level compression shows additional benefits beyond row-level compression.

To turn page-level compression on for the table, execute the following statement:

ALTER TABLE dbo.ArchiveJobPosting REBUILD WITH (DATA_COMPRESSION = PAGE);

Data compression can also be configured at the partition level. In the next set of commands, a new partitioning function and scheme are created and applied to a new table. The table will use varying compression levels based on the partition.

CREATE PARTITION FUNCTION pfn_ArchivePart(int)
AS RANGE LEFT FOR VALUES (50000, 100000, 150000);
GO
CREATE PARTITION SCHEME psc_ArchivePart
AS PARTITION pfn_ArchivePart
TO (hitfg1, hitfg2, hitfg3, hitfg4);
GO
CREATE TABLE dbo.ArchiveJobPosting_V2
       (
        JobPostingID INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
        CandidateID INT NOT NULL,
        JobDesc CHAR(2000) NOT NULL
       )
ON     psc_ArchivePart(JobPostingID)
WITH (
    DATA_COMPRESSION = PAGE ON PARTITIONS (1 TO 3),
    DATA_COMPRESSION = ROW ON PARTITIONS (4));
GO

The partitions you want to apply a data compression type to can be specified as a single partition number, a range of partitions with the starting and ending partitions separated by the TO keyword, or as a comma-delimited list of partition numbers and ranges. All of these partition options can be used at the same time.

If you want to change the compression level for any of the partitions, utilize the ALTER TABLE statement. This example changes partition 4 from row-level to page-level compression:

ALTER TABLE dbo.ArchiveJobPosting_V2
REBUILD PARTITION = 4
WITH (DATA_COMPRESSION = PAGE);
GO

15-16. Rebuilding a Heap

Problem

You have a heap (a table without a clustered index) that has become severely fragmented, and you want to reduce both the fragmentation and the number of forwarded records in the table.

Solution

Utilize the REBUILD option of the ALTER TABLE statement to rebuild a heap. For example:

CREATE TABLE dbo.HeapTest
(
        HeapTest VARCHAR(1000)
);
GO
INSERT INTO dbo.HeapTest (HeapTest)
VALUES ('Test'),
GO 10000
SELECT  index_type_desc,
        fragment_count,
        page_count,
        forwarded_record_count
FROM    sys.dm_db_index_physical_stats(DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED')
WHERE   object_id = OBJECT_ID('dbo.HeapTest'),
GO
UPDATE dbo.HeapTest
SET HeapTest = REPLICATE('Test',250);
GO
SELECT  index_type_desc,
        fragment_count,
        page_count,
        forwarded_record_count
FROM    sys.dm_db_index_physical_stats(DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED')
WHERE   object_id = OBJECT_ID(' dbo.HeapTest'),
GO
ALTER TABLE dbo.HeapTest REBUILD;
GO

SELECT  index_type_desc,
        fragment_count,
        page_count,
        forwarded_record_count
FROM    sys.dm_db_index_physical_stats(DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED')
WHERE   object_id = OBJECT_ID(' dbo.HeapTest'),
GO

These queries return the following result sets:

index_type_desc fragment_count       page_count           forwarded_record_count
--------------- -------------------- -------------------- ----------------------
HEAP            4                    23                   0

index_type_desc fragment_count       page_count           forwarded_record_count
--------------- -------------------- -------------------- ----------------------
HEAP            5                    1442                 9934

index_type_desc fragment_count       page_count           forwarded_record_count
--------------- -------------------- -------------------- ----------------------
HEAP            4                    1430                 0

Note The fragment_count value will differ on your system and will even change if you run this recipe multiple times.

How It Works

In this example, a table is created with a single VARCHAR(1000) column, and 10,000 rows are added to this table with the value Test. An UPDATE statement is then run, which expands the data in this column to be Test repeated 250 times, for a total length of 1,000, which completely fills up the column. When the data is initially populated with the INSERT statement, the data pages are filled with as many rows as can fit. When the UPDATE statement is run, most of these rows have to move to other pages because fewer rows can fit onto a page. When rows are moved on a heap, a forwarding record is left in the place of the original row, causing an even further increased need for data pages.

During this process, the physical index statistics are being computed. From the results, it is obvious that the UPDATE statement causes a massive growth in the number of pages required to hold the data and in the number of forwarded records. After the table is rebuilt, the table now uses fewer pages, and the table no longer has any forwarded records.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 15: Managing Large Tables and Databases

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 15: Managing Large Tables and Databases