Partitioned Views

Simply put, a partitioned view is a view that unions tables together that serve as partitions—or sections—of a larger set of data. For example, a partitioned view over the activity on a Web site might union together separate tables for each month of the year. Each of these tables would store the activity data for a particular month. By unioning them together, the partitioned view would allow them to be treated as a single table while still keeping their sizes manageable.

There are two types of partitioned views—local partitioned views (LPVs) and distributed partitioned views (DPVs). An LPV is a view in which all the underlying tables reside on the same SQL Server instance. A DPV is a view in which they live on separate instances. These instances don't have to be on different machines, but usually they are. Spreading a DPV across multiple machines helps “scale out” large SQL Server implementations. It can effectively bring the processing power and resources of many machines together to process a single query.

A partitioned view is a normal view object that unions together tables with certain attributes. It is characterized by tables with the same structure that are combined to provide a unified view of a set of data but are segmented on a clearly defined “partitioning column.” This partitioning column is set up using a CHECK constraint. In addition to doing what CHECK constraints always do—namely, controlling the type of data a column will accept—the partitioning column of an LPV or a DPV provides a means for the SQL Server query optimizer to be able to determine which partition a given column value resides in as it creates a query plan. This allows it to eliminate searches for the other partitions from the query plan when optimizing a query that includes the partitioning column.

This is best understood by way of example (Listing 17.1).

Listing 17.1. A Basic Partitioned View
CREATE TABLE CustomersUS (
  CustomerID nchar (5) NOT NULL,
  CompanyName nvarchar (40) NOT NULL ,
  ContactName nvarchar (30) NULL ,
  ContactTitle nvarchar (30) NULL ,
  Address nvarchar (60) NULL ,
  City nvarchar (15) NULL ,
  Region nvarchar (15) NULL ,
  PostalCode nvarchar (10) NULL ,
  Country nvarchar (15) NOT NULL CHECK (Country='US'),
  Phone nvarchar (24) NULL ,
  Fax nvarchar (24) NULL,
CONSTRAINT PK_CustUS PRIMARY KEY (Country, CustomerID)
)

CREATE TABLE CustomersUK (
  CustomerID nchar (5) NOT NULL,
  CompanyName nvarchar (40) NOT NULL ,
  ContactName nvarchar (30) NULL ,
  ContactTitle nvarchar (30) NULL ,
  Address nvarchar (60) NULL ,
  City nvarchar (15) NULL ,
  Region nvarchar (15) NULL ,
  PostalCode nvarchar (10) NULL ,
  Country nvarchar (15) NOT NULL CHECK (Country='UK'),
  Phone nvarchar (24) NULL ,
  Fax nvarchar (24) NULL,
CONSTRAINT PK_CustUK PRIMARY KEY (Country, CustomerID)
)

CREATE TABLE CustomersFrance (
  CustomerID nchar (5) NOT NULL,
  CompanyName nvarchar (40) NOT NULL ,
  ContactName nvarchar (30) NULL ,
  ContactTitle nvarchar (30) NULL ,
  Address nvarchar (60) NULL ,
  City nvarchar (15) NULL ,
  Region nvarchar (15) NULL ,
  PostalCode nvarchar (10) NULL ,
  Country nvarchar (15) NOT NULL CHECK (Country='France'),
  Phone nvarchar (24) NULL ,
  Fax nvarchar (24) NULL,
CONSTRAINT PK_CustFR PRIMARY KEY (Country, CustomerID)
)

GO

DROP VIEW CustomersV
GO

CREATE VIEW CustomersV
AS
SELECT * FROM dbo.CustomersUS
UNION ALL
SELECT * FROM dbo.CustomersUK
UNION ALL
SELECT * FROM dbo.CustomersFrance
GO

As you can see, we create three tables to store partitions, or slices, of a customer table. We then union those tables back together using a partitioned view. What's the point? Why not store the data as a single table? There are two advantages to using this approach: (1) By segmenting the customer data, we keep the tables more manageable—the individual partitions will be a fraction of the size of the entire customer table; and (2) SQL Server's query optimizer can recognize partitioned views and automatically determine the right underlying table to query based on the filter criteria and the CHECK constraint on the partitioning column. For example, have a look at the execution plan of the following query:

SELECT * FROM dbo.Customersv WHERE Country='US'

(Results abridged)
StmtText
------------------------------------------------------------------
SELECT CompanyName=CompanyName FROM dbo.CustomersV WHERE Country=@
  |--Compute Scalar(DEFINE:(CustomersUS.CompanyName=CustomersUS.Co
       |--Clustered Index Scan(OBJECT:(Northwind.dbo.CustomersUS.P

Even though we reference the view in our query, the optimizer figures out that the data we're requesting could only reside in one of the view's underlying tables, so it queries that table directly and omits the others from the query plan. It uses the WHERE clause criteria and the partitioning column in each table (its primary key) to make this determination.

There are a number of requirements that a view and its base tables must meet in order for the view to be a partitioned view in the first place and for the optimizer to be able to optimize queries against it in this way. The number and significance of these requirements have, in fact, discouraged many users from taking advantage of partitioned views, especially LPVs. Whether you use them is a judgment call you'll have to make; you can read up on the requirements and limitations associated with partitioned views in Books Online. Regarding the optimizer's ability to use the partitioning column to identify the correct base table to search for data, it has been my experience that the partitioning column should be the leftmost in the primary key. This bears further examination. Consider the partitioned view and query shown in Listing 17.2.

Listing 17.2. A Partitioning Column/Primary Key Mismatch
CREATE TABLE Orders1996 (
  OrderID int PRIMARY KEY NOT NULL ,
  CustomerID nchar (5) NULL ,
  EmployeeID int NULL ,
  OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1996),
  OrderYear int NOT NULL CHECK (OrderYear=1996),
  RequiredDate datetime NULL ,
  ShippedDate datetime NULL ,
  ShipVia int NULL
)
GO

CREATE TABLE Orders1997 (
  OrderID int PRIMARY KEY NOT NULL ,
  CustomerID nchar (5) NULL ,
  EmployeeID int NULL ,
  OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1997),
  OrderYear int NOT NULL CHECK (OrderYear=1997),
  RequiredDate datetime NULL ,
  ShippedDate datetime NULL ,
  ShipVia int NULL
)
GO

CREATE TABLE Orders1998 (
  OrderID int PRIMARY KEY NOT NULL ,
  CustomerID nchar (5) NULL ,
  EmployeeID int NULL ,
  OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1998),
  OrderYear int NOT NULL CHECK (OrderYear=1998),
  RequiredDate datetime NULL ,
  ShippedDate datetime NULL ,
  ShipVia int NULL
)
GO

CREATE VIEW OrdersV
AS
SELECT * FROM Orders1996
UNION ALL
SELECT * FROM Orders1997
UNION ALL
SELECT * FROM Orders1998
GO

SELECT * FROM OrdersV WHERE OrderYear=1997

Will the optimizer be able to narrow its search to just the Orders1997 partition? Let's look at the execution plan:

(Results abridged)

------------------------------------------------------------------
Executes    StmtText
----------- ------------------------------------------------------
1           SELECT * FROM [OrdersV] WHERE [OrderYear]=@1
1             |--Concatenation
1                  |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199
0                  |    |--Clustered Index Scan(OBJECT:([Northwind
1                  |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199
1                  |    |--Clustered Index Scan(OBJECT:([Northwind
1                  |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199
0                       |--Clustered Index Scan(OBJECT:([Northwind

At first glance, it might appear that even though we're querying data from only one of the partitions, the query plan includes searches for all three of them—but have a closer look. Note the Executes column. It indicates the number of times the corresponding query step was executed. In this case, it tells us that only one of the tables is actually searched. The other two have a 0 in their Executes column. This means that even though the partition could not be eliminated at compile-time, the server was able to get rid of it at runtime. Runtime partition elimination is a very handy feature that allows a plan to be used with a greater number of input parameters than one in which partitions are eliminated when a plan is first compiled. When a partition is eliminated at compile-time, it can service only user queries that end up querying the same partition. With runtime partition elimination, the plan can be used for any potential parameter value, regardless of the partition it eventually routes to.

Despite the fact that runtime partition elimination generally leads to better plan reuse, you may be wondering why the optimizer doesn't eliminate the unneeded partitions at compile-time. In order to get the optimizer to eliminate the unused partitions at compile-time, we need to include all the columns from the primary key in the query's filter criteria. Listing 17.3 shows such a revision and its resultant execution plan.

Listing 17.3. Matching Up the Partitioning Column and Primary Key to Eliminate Partitions from the Execution Plan
SELECT * FROM OrdersV WHERE OrderYear=1997 AND OrderID=1000

(Results abridged)

StmtText
------------------------------------------------------------------
SELECT * FROM OrdersV WHERE OrderYear=@1 AND OrderID=@2
|-Compute Scalar(DEFINE:(Orders1997.OrderID=Orders1997.OrderID, Or
  |-Clustered Index Scan(OBJECT:(Northwind.dbo.Orders1997.PK_Order

Now we see compile-time partition elimination. Not only is the partitioning column included in its host table's primary key but all the columns in the primary key are included in the query's search criteria as well. In this case, merely matching query columns to primary key columns left to right wasn't enough to achieve the compile-time partition elimination we were striving for—the query had to include all the columns in the primary key or the optimizer produced an inefficient plan. Listing 17.4 presents a variation on our partitioned view and query that demonstrates the same concept.

Listing 17.4. Adding a Column to the Primary Key to Prevent Compile-Time Partition Elimination
CREATE TABLE Orders1996 (
  OrderID int NOT NULL ,
  CustomerID nchar (5) NOT NULL ,
  EmployeeID int NULL ,
  OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1996),
  OrderYear int NOT NULL DEFAULT 1996 CHECK (OrderYear=1996),
  RequiredDate datetime NULL ,
  ShippedDate datetime NULL ,
  ShipVia int NULL,
  CONSTRAINT PK_Orders1996
  PRIMARY KEY (OrderYear, OrderID, CustomerId)
)
GO

CREATE TABLE Orders1997 (
  OrderID int NOT NULL ,
  CustomerID nchar (5) NOT NULL ,
  EmployeeID int NULL ,
  OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1997),
  OrderYear int NOT NULL DEFAULT 1997 CHECK (OrderYear=1997),
  RequiredDate datetime NULL ,
  ShippedDate datetime NULL ,
  ShipVia int NULL,
  CONSTRAINT PK_Orders1997
  PRIMARY KEY (OrderYear, OrderID, CustomerId)
)
GO

CREATE TABLE Orders1998 (
  OrderID int NOT NULL ,
  CustomerID nchar (5) NOT NULL ,
  EmployeeID int NULL ,
  OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1998),
  OrderYear int NOT NULL DEFAULT 1998 CHECK (OrderYear=1998),
  RequiredDate datetime NULL ,
  ShippedDate datetime NULL ,
  ShipVia int NULL,
  CONSTRAINT PK_Orders1998
  PRIMARY KEY (OrderYear, OrderID, CustomerId)
)
GO

CREATE VIEW OrdersV
AS
SELECT * FROM Orders1996
UNION ALL
SELECT * FROM Orders1997
UNION ALL
SELECT * FROM Orders1998
GO

SELECT * FROM OrdersV WHERE OrderYear=1997 AND OrderID=1000

(Results abridged)

------------------------------------------------------------------
Executes    StmtText
----------- ------------------------------------------------------
1           SELECT * FROM [OrdersV] WHERE [OrderYear]=@1 AND [Orde
1             |--Concatenation
1                  |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199
0                  |    |--Clustered Index Seek(OBJECT:([Northwind
1                  |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199
1                  |    |--Clustered Index Seek(OBJECT:([Northwind
1                  |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199
0                       |--Clustered Index Seek(OBJECT:([Northwind

Here, we've added the CustomerID column to each partition's primary key, but we haven't changed the query to include this new column in its search criteria. The result is a plan in which the optimizer eliminates the unneeded partitions at runtime rather than at compile-time (note the Executes column). Let's see what happens when we add the CustomerID to the query's search criteria (Listing 17.5).

Listing 17.5. Adding CustomerID to the Search Criteria to Facilitate Compile-Time Elimination
SELECT * FROM OrdersV
WHERE OrderYear=1997 AND OrderID=1000 AND CustomerID = 'AAAAA'

StmtText
------------------------------------------------------------------
SELECT * FROM OrdersV WHERE OrderYear=@1 AND OrderID=@2 AND Custom
 |-Compute Scalar(DEFINE:(Orders1997.OrderID=Orders1997.OrderID, O
  |-Clustered Index Scan(OBJECT:(Northwind.dbo.Orders1997.PK_Order

Once again, we're seeing compile-time partition elimination because we've matched the query criteria with the primary columns of the partitioned view one-to-one. In this case, we achieved the compile-time partition elimination by adding columns to the search criteria. We could just as easily have removed columns from the primary key of each partition.

As the very first example in this section illustrated, it's not a requirement that you must always filter by the entire primary key in order to get an efficient execution plan when querying a partitioned view. Just be aware that this may be necessary in order to force the optimizer to eliminate partitions at compile-time rather than runtime.

BETWEEN and Partitioned View Queries

In addition to the primary key–partition column relationship, another thing to watch out for with partitioned views is the use of theta (nonequality) operators. Even if the operators in the partitioning CHECK constraint and the query match exactly, the optimizer may be unable to correctly identify the partition to search when theta operators are used. This means that it may have to search all partitions and concatenate the results. For example, consider the partitioned view and query shown in Listing 17.6.

Listing 17.6. A Query That Features a BETWEEN Predicate against a Partitioned View
CREATE TABLE CustomersUS (
  CustomerID nchar (5) NOT NULL,
  CompanyName nvarchar (40) NOT NULL ,
  ContactName nvarchar (30) NULL ,
  ContactTitle nvarchar (30) NULL ,
  Address nvarchar (60) NULL ,
  City nvarchar (15) NULL ,
  Region nvarchar (15) NULL ,
  PostalCode nvarchar (10) NULL ,
  Country nvarchar (15) NOT NULL CHECK (Country='US'),
  Phone nvarchar (24) NULL ,
  Fax nvarchar (24) NULL,
PRIMARY KEY (Country, CustomerID)
)

CREATE TABLE CustomersUK (
  CustomerID nchar (5) NOT NULL,
  CompanyName nvarchar (40) NOT NULL ,
  ContactName nvarchar (30) NULL ,
  ContactTitle nvarchar (30) NULL ,
  Address nvarchar (60) NULL ,
  City nvarchar (15) NULL ,
  Region nvarchar (15) NULL ,
  PostalCode nvarchar (10) NULL ,
  Country nvarchar (15) NOT NULL CHECK (Country='UK'),
  Phone nvarchar (24) NULL ,
  Fax nvarchar (24) NULL,
PRIMARY KEY (Country, CustomerID)
)

CREATE TABLE CustomersFrance (
  CustomerID nchar (5) NOT NULL,
  CompanyName nvarchar (40) NOT NULL ,
  ContactName nvarchar (30) NULL ,
  ContactTitle nvarchar (30) NULL ,
  Address nvarchar (60) NULL ,
  City nvarchar (15) NULL ,
  Region nvarchar (15) NULL ,
  PostalCode nvarchar (10) NULL ,
  Country nvarchar (15) NOT NULL CHECK (Country='France'),
  Phone nvarchar (24) NULL ,
  Fax nvarchar (24) NULL,
PRIMARY KEY (Country, CustomerID)
)

GO

DROP VIEW CustomersV
GO

CREATE VIEW CustomersV
AS
SELECT * FROM dbo.CustomersUS
UNION ALL
SELECT * FROM dbo.CustomersUK
UNION ALL
SELECT * FROM dbo.CustomersFrance
GO

SELECT * FROM dbo.CustomersV WHERE Country BETWEEN 'UK' AND 'US'

Even though the view's partitioning column is each underlying table's primary key and even though the query filter criteria and the partitioning CHECK constraint match exactly, here's the execution plan we get:

(Results abridged)

Executes  StmtText
--------- --------------------------------------------------------
1         SELECT * FROM [dbo].[CustomersV] WHERE [Country]>=@1 AND
1           |--Concatenation
1                |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='US'
1                |    |--Clustered Index Seek(OBJECT:([Northwind].
1                |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='UK'
1                |    |--Clustered Index Seek(OBJECT:([Northwind].
1                |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='Fra
0                     |--Clustered Index Seek(OBJECT:([Northwind].

The CustomersFrance base table is mentioned in the plan even though it's not logically possible that CustomersFrance contains the data we're seeking. Note, however, that CustomersFrance is not actually searched (its Clustered Index Seek operator has 0 executions). Once again, the optimizer has generated a plan that eliminates unneeded partitions at runtime. This is a more flexible and generally more useful plan than one in which they are eliminated when the plan is initially compiled.

Distributed Partitioned Views

A DPV is a partitioned view whose base tables are scattered across a federation (a group) of autonomous servers. These remote base tables are accessed via linked server definitions. To set up a DPV, follow these steps.

1.
Create linked server definitions for the servers on which the remote tables you want to access reside.

2.
Enable the lazy schema validation server option for each linked server. This option cannot be set using the Linked Server Properties dialog in Enterprise Manager, so you must use sp_serveroption.

3.
Create a partitioned view that references the remote partitions using four part names.

4.
Repeat these steps on each of the linked servers referenced by the partitioned view. Doing this will allow you to load-balance your SQL Server environment by routing users to different versions of the same view.

Listing 17.7 shows our earlier partitioned view converted to a DPV.

Listing 17.7. A Distributed Partitioned View in the Wild
CREATE VIEW OrdersV
AS
SELECT * FROM Orders1996
UNION ALL
SELECT * FROM HOMER.Northwind.dbo.Orders1997
UNION ALL
SELECT * FROM MARGE.Northwind.dbo.Orders1998
GO

SELECT CustomerID FROM OrdersV WHERE OrderYear=1997 AND
    OrderID=1000
(Results abridged)
StmtText
-------------------------------------------------------------------------------
SELECT CustomerID=CustomerID FROM OrdersV WHERE OrderYear=@1 AND OrderID=@2
 |-Compute Scalar(DEFINE:(HOMER.Northwind.dbo.Orders1997.CustomerID=HOMER.nort
  |-Remote Query(SOURCE:(HOMER),QUERY:(SELECT Col1024 FROM (SELECT Tbl1003.
    "OrderID" Col1023,Tbl1003."CustomerID" Col1024,Tbl1003."OrderYear" Col1027
    FROM "northwind"."dbo"."Orders1997" Tbl1003) Qry1031 WHERE Col1023=(1000)))

As you can see, given that we've properly matched up the query's filter criteria with the partitioned view's primary key, the optimizer correctly focuses the search on just one of the partitions. Since that partition resides on a linked server, the optimizer adds a Remote Query step to the plan and sends the query to the remote server. Note that the WHERE clause of the remote query (bolded) does not include the partition column even though the original query does. This is because it isn't needed. Once the correct partition has been identified, the partition column itself isn't needed to locate the data in the remote table. By virtue of the CHECK constraint, the optimizer knows that Orders1997 contains data for only one partition, 1997.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.62.105