Simply put, a partitioned view is a view that unions tables together that serve as partitions—or sections—of a larger set of data. For example, a partitioned view over the activity on a Web site might union together separate tables for each month of the year. Each of these tables would store the activity data for a particular month. By unioning them together, the partitioned view would allow them to be treated as a single table while still keeping their sizes manageable.
There are two types of partitioned views—local partitioned views (LPVs) and distributed partitioned views (DPVs). An LPV is a view in which all the underlying tables reside on the same SQL Server instance. A DPV is a view in which they live on separate instances. These instances don't have to be on different machines, but usually they are. Spreading a DPV across multiple machines helps “scale out” large SQL Server implementations. It can effectively bring the processing power and resources of many machines together to process a single query.
A partitioned view is a normal view object that unions together tables with certain attributes. It is characterized by tables with the same structure that are combined to provide a unified view of a set of data but are segmented on a clearly defined “partitioning column.” This partitioning column is set up using a CHECK constraint. In addition to doing what CHECK constraints always do—namely, controlling the type of data a column will accept—the partitioning column of an LPV or a DPV provides a means for the SQL Server query optimizer to be able to determine which partition a given column value resides in as it creates a query plan. This allows it to eliminate searches for the other partitions from the query plan when optimizing a query that includes the partitioning column.
This is best understood by way of example (Listing 17.1).
CREATE TABLE CustomersUS ( CustomerID nchar (5) NOT NULL, CompanyName nvarchar (40) NOT NULL , ContactName nvarchar (30) NULL , ContactTitle nvarchar (30) NULL , Address nvarchar (60) NULL , City nvarchar (15) NULL , Region nvarchar (15) NULL , PostalCode nvarchar (10) NULL , Country nvarchar (15) NOT NULL CHECK (Country='US'), Phone nvarchar (24) NULL , Fax nvarchar (24) NULL, CONSTRAINT PK_CustUS PRIMARY KEY (Country, CustomerID) ) CREATE TABLE CustomersUK ( CustomerID nchar (5) NOT NULL, CompanyName nvarchar (40) NOT NULL , ContactName nvarchar (30) NULL , ContactTitle nvarchar (30) NULL , Address nvarchar (60) NULL , City nvarchar (15) NULL , Region nvarchar (15) NULL , PostalCode nvarchar (10) NULL , Country nvarchar (15) NOT NULL CHECK (Country='UK'), Phone nvarchar (24) NULL , Fax nvarchar (24) NULL, CONSTRAINT PK_CustUK PRIMARY KEY (Country, CustomerID) ) CREATE TABLE CustomersFrance ( CustomerID nchar (5) NOT NULL, CompanyName nvarchar (40) NOT NULL , ContactName nvarchar (30) NULL , ContactTitle nvarchar (30) NULL , Address nvarchar (60) NULL , City nvarchar (15) NULL , Region nvarchar (15) NULL , PostalCode nvarchar (10) NULL , Country nvarchar (15) NOT NULL CHECK (Country='France'), Phone nvarchar (24) NULL , Fax nvarchar (24) NULL, CONSTRAINT PK_CustFR PRIMARY KEY (Country, CustomerID) ) GO DROP VIEW CustomersV GO CREATE VIEW CustomersV AS SELECT * FROM dbo.CustomersUS UNION ALL SELECT * FROM dbo.CustomersUK UNION ALL SELECT * FROM dbo.CustomersFrance GO |
As you can see, we create three tables to store partitions, or slices, of a customer table. We then union those tables back together using a partitioned view. What's the point? Why not store the data as a single table? There are two advantages to using this approach: (1) By segmenting the customer data, we keep the tables more manageable—the individual partitions will be a fraction of the size of the entire customer table; and (2) SQL Server's query optimizer can recognize partitioned views and automatically determine the right underlying table to query based on the filter criteria and the CHECK constraint on the partitioning column. For example, have a look at the execution plan of the following query:
SELECT * FROM dbo.Customersv WHERE Country='US'
StmtText
------------------------------------------------------------------
SELECT CompanyName=CompanyName FROM dbo.CustomersV WHERE Country=@
|--Compute Scalar(DEFINE:(CustomersUS.CompanyName=CustomersUS.Co
|--Clustered Index Scan(OBJECT:(Northwind.dbo.CustomersUS.P
Even though we reference the view in our query, the optimizer figures out that the data we're requesting could only reside in one of the view's underlying tables, so it queries that table directly and omits the others from the query plan. It uses the WHERE clause criteria and the partitioning column in each table (its primary key) to make this determination.
There are a number of requirements that a view and its base tables must meet in order for the view to be a partitioned view in the first place and for the optimizer to be able to optimize queries against it in this way. The number and significance of these requirements have, in fact, discouraged many users from taking advantage of partitioned views, especially LPVs. Whether you use them is a judgment call you'll have to make; you can read up on the requirements and limitations associated with partitioned views in Books Online. Regarding the optimizer's ability to use the partitioning column to identify the correct base table to search for data, it has been my experience that the partitioning column should be the leftmost in the primary key. This bears further examination. Consider the partitioned view and query shown in Listing 17.2.
CREATE TABLE Orders1996 ( OrderID int PRIMARY KEY NOT NULL , CustomerID nchar (5) NULL , EmployeeID int NULL , OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1996), OrderYear int NOT NULL CHECK (OrderYear=1996), RequiredDate datetime NULL , ShippedDate datetime NULL , ShipVia int NULL ) GO CREATE TABLE Orders1997 ( OrderID int PRIMARY KEY NOT NULL , CustomerID nchar (5) NULL , EmployeeID int NULL , OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1997), OrderYear int NOT NULL CHECK (OrderYear=1997), RequiredDate datetime NULL , ShippedDate datetime NULL , ShipVia int NULL ) GO CREATE TABLE Orders1998 ( OrderID int PRIMARY KEY NOT NULL , CustomerID nchar (5) NULL , EmployeeID int NULL , OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1998), OrderYear int NOT NULL CHECK (OrderYear=1998), RequiredDate datetime NULL , ShippedDate datetime NULL , ShipVia int NULL ) GO CREATE VIEW OrdersV AS SELECT * FROM Orders1996 UNION ALL SELECT * FROM Orders1997 UNION ALL SELECT * FROM Orders1998 GO SELECT * FROM OrdersV WHERE OrderYear=1997 |
Will the optimizer be able to narrow its search to just the Orders1997 partition? Let's look at the execution plan:
(Results abridged)
------------------------------------------------------------------ Executes StmtText ----------- ------------------------------------------------------ 1 SELECT * FROM [OrdersV] WHERE [OrderYear]=@1 1 |--Concatenation 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199 0 | |--Clustered Index Scan(OBJECT:([Northwind 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199 1 | |--Clustered Index Scan(OBJECT:([Northwind 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199 0 |--Clustered Index Scan(OBJECT:([Northwind
At first glance, it might appear that even though we're querying data from only one of the partitions, the query plan includes searches for all three of them—but have a closer look. Note the Executes column. It indicates the number of times the corresponding query step was executed. In this case, it tells us that only one of the tables is actually searched. The other two have a 0 in their Executes column. This means that even though the partition could not be eliminated at compile-time, the server was able to get rid of it at runtime. Runtime partition elimination is a very handy feature that allows a plan to be used with a greater number of input parameters than one in which partitions are eliminated when a plan is first compiled. When a partition is eliminated at compile-time, it can service only user queries that end up querying the same partition. With runtime partition elimination, the plan can be used for any potential parameter value, regardless of the partition it eventually routes to.
Despite the fact that runtime partition elimination generally leads to better plan reuse, you may be wondering why the optimizer doesn't eliminate the unneeded partitions at compile-time. In order to get the optimizer to eliminate the unused partitions at compile-time, we need to include all the columns from the primary key in the query's filter criteria. Listing 17.3 shows such a revision and its resultant execution plan.
SELECT * FROM OrdersV WHERE OrderYear=1997 AND OrderID=1000 (Results abridged) StmtText ------------------------------------------------------------------ SELECT * FROM OrdersV WHERE OrderYear=@1 AND OrderID=@2 |-Compute Scalar(DEFINE:(Orders1997.OrderID=Orders1997.OrderID, Or |-Clustered Index Scan(OBJECT:(Northwind.dbo.Orders1997.PK_Order |
Now we see compile-time partition elimination. Not only is the partitioning column included in its host table's primary key but all the columns in the primary key are included in the query's search criteria as well. In this case, merely matching query columns to primary key columns left to right wasn't enough to achieve the compile-time partition elimination we were striving for—the query had to include all the columns in the primary key or the optimizer produced an inefficient plan. Listing 17.4 presents a variation on our partitioned view and query that demonstrates the same concept.
CREATE TABLE Orders1996 ( OrderID int NOT NULL , CustomerID nchar (5) NOT NULL , EmployeeID int NULL , OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1996), OrderYear int NOT NULL DEFAULT 1996 CHECK (OrderYear=1996), RequiredDate datetime NULL , ShippedDate datetime NULL , ShipVia int NULL, CONSTRAINT PK_Orders1996 PRIMARY KEY (OrderYear, OrderID, CustomerId) ) GO CREATE TABLE Orders1997 ( OrderID int NOT NULL , CustomerID nchar (5) NOT NULL , EmployeeID int NULL , OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1997), OrderYear int NOT NULL DEFAULT 1997 CHECK (OrderYear=1997), RequiredDate datetime NULL , ShippedDate datetime NULL , ShipVia int NULL, CONSTRAINT PK_Orders1997 PRIMARY KEY (OrderYear, OrderID, CustomerId) ) GO CREATE TABLE Orders1998 ( OrderID int NOT NULL , CustomerID nchar (5) NOT NULL , EmployeeID int NULL , OrderDate datetime NOT NULL CHECK (Year(OrderDate)=1998), OrderYear int NOT NULL DEFAULT 1998 CHECK (OrderYear=1998), RequiredDate datetime NULL , ShippedDate datetime NULL , ShipVia int NULL, CONSTRAINT PK_Orders1998 PRIMARY KEY (OrderYear, OrderID, CustomerId) ) GO CREATE VIEW OrdersV AS SELECT * FROM Orders1996 UNION ALL SELECT * FROM Orders1997 UNION ALL SELECT * FROM Orders1998 GO SELECT * FROM OrdersV WHERE OrderYear=1997 AND OrderID=1000 (Results abridged) ------------------------------------------------------------------ Executes StmtText ----------- ------------------------------------------------------ 1 SELECT * FROM [OrdersV] WHERE [OrderYear]=@1 AND [Orde 1 |--Concatenation 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199 0 | |--Clustered Index Seek(OBJECT:([Northwind 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199 1 | |--Clustered Index Seek(OBJECT:([Northwind 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])=199 0 |--Clustered Index Seek(OBJECT:([Northwind |
Here, we've added the CustomerID column to each partition's primary key, but we haven't changed the query to include this new column in its search criteria. The result is a plan in which the optimizer eliminates the unneeded partitions at runtime rather than at compile-time (note the Executes column). Let's see what happens when we add the CustomerID to the query's search criteria (Listing 17.5).
SELECT * FROM OrdersV WHERE OrderYear=1997 AND OrderID=1000 AND CustomerID = 'AAAAA' StmtText ------------------------------------------------------------------ SELECT * FROM OrdersV WHERE OrderYear=@1 AND OrderID=@2 AND Custom |-Compute Scalar(DEFINE:(Orders1997.OrderID=Orders1997.OrderID, O |-Clustered Index Scan(OBJECT:(Northwind.dbo.Orders1997.PK_Order |
Once again, we're seeing compile-time partition elimination because we've matched the query criteria with the primary columns of the partitioned view one-to-one. In this case, we achieved the compile-time partition elimination by adding columns to the search criteria. We could just as easily have removed columns from the primary key of each partition.
As the very first example in this section illustrated, it's not a requirement that you must always filter by the entire primary key in order to get an efficient execution plan when querying a partitioned view. Just be aware that this may be necessary in order to force the optimizer to eliminate partitions at compile-time rather than runtime.
In addition to the primary key–partition column relationship, another thing to watch out for with partitioned views is the use of theta (nonequality) operators. Even if the operators in the partitioning CHECK constraint and the query match exactly, the optimizer may be unable to correctly identify the partition to search when theta operators are used. This means that it may have to search all partitions and concatenate the results. For example, consider the partitioned view and query shown in Listing 17.6.
CREATE TABLE CustomersUS ( CustomerID nchar (5) NOT NULL, CompanyName nvarchar (40) NOT NULL , ContactName nvarchar (30) NULL , ContactTitle nvarchar (30) NULL , Address nvarchar (60) NULL , City nvarchar (15) NULL , Region nvarchar (15) NULL , PostalCode nvarchar (10) NULL , Country nvarchar (15) NOT NULL CHECK (Country='US'), Phone nvarchar (24) NULL , Fax nvarchar (24) NULL, PRIMARY KEY (Country, CustomerID) ) CREATE TABLE CustomersUK ( CustomerID nchar (5) NOT NULL, CompanyName nvarchar (40) NOT NULL , ContactName nvarchar (30) NULL , ContactTitle nvarchar (30) NULL , Address nvarchar (60) NULL , City nvarchar (15) NULL , Region nvarchar (15) NULL , PostalCode nvarchar (10) NULL , Country nvarchar (15) NOT NULL CHECK (Country='UK'), Phone nvarchar (24) NULL , Fax nvarchar (24) NULL, PRIMARY KEY (Country, CustomerID) ) CREATE TABLE CustomersFrance ( CustomerID nchar (5) NOT NULL, CompanyName nvarchar (40) NOT NULL , ContactName nvarchar (30) NULL , ContactTitle nvarchar (30) NULL , Address nvarchar (60) NULL , City nvarchar (15) NULL , Region nvarchar (15) NULL , PostalCode nvarchar (10) NULL , Country nvarchar (15) NOT NULL CHECK (Country='France'), Phone nvarchar (24) NULL , Fax nvarchar (24) NULL, PRIMARY KEY (Country, CustomerID) ) GO DROP VIEW CustomersV GO CREATE VIEW CustomersV AS SELECT * FROM dbo.CustomersUS UNION ALL SELECT * FROM dbo.CustomersUK UNION ALL SELECT * FROM dbo.CustomersFrance GO SELECT * FROM dbo.CustomersV WHERE Country BETWEEN 'UK' AND 'US' |
Even though the view's partitioning column is each underlying table's primary key and even though the query filter criteria and the partitioning CHECK constraint match exactly, here's the execution plan we get:
(Results abridged)
Executes StmtText --------- -------------------------------------------------------- 1 SELECT * FROM [dbo].[CustomersV] WHERE [Country]>=@1 AND 1 |--Concatenation 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='US' 1 | |--Clustered Index Seek(OBJECT:([Northwind]. 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='UK' 1 | |--Clustered Index Seek(OBJECT:([Northwind]. 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='Fra 0 |--Clustered Index Seek(OBJECT:([Northwind].
The CustomersFrance base table is mentioned in the plan even though it's not logically possible that CustomersFrance contains the data we're seeking. Note, however, that CustomersFrance is not actually searched (its Clustered Index Seek operator has 0 executions). Once again, the optimizer has generated a plan that eliminates unneeded partitions at runtime. This is a more flexible and generally more useful plan than one in which they are eliminated when the plan is initially compiled.
A DPV is a partitioned view whose base tables are scattered across a federation (a group) of autonomous servers. These remote base tables are accessed via linked server definitions. To set up a DPV, follow these steps.
1. | Create linked server definitions for the servers on which the remote tables you want to access reside. |
2. | Enable the lazy schema validation server option for each linked server. This option cannot be set using the Linked Server Properties dialog in Enterprise Manager, so you must use sp_serveroption. |
3. | Create a partitioned view that references the remote partitions using four part names. |
4. | Repeat these steps on each of the linked servers referenced by the partitioned view. Doing this will allow you to load-balance your SQL Server environment by routing users to different versions of the same view. |
Listing 17.7 shows our earlier partitioned view converted to a DPV.
CREATE VIEW OrdersV
AS
SELECT * FROM Orders1996
UNION ALL
SELECT * FROM HOMER.Northwind.dbo.Orders1997
UNION ALL
SELECT * FROM MARGE.Northwind.dbo.Orders1998
GO
SELECT CustomerID FROM OrdersV WHERE OrderYear=1997 AND
OrderID=1000
(Results abridged)
StmtText
-------------------------------------------------------------------------------
SELECT CustomerID=CustomerID FROM OrdersV WHERE OrderYear=@1 AND OrderID=@2
|-Compute Scalar(DEFINE:(HOMER.Northwind.dbo.Orders1997.CustomerID=HOMER.nort
|-Remote Query(SOURCE:(HOMER),QUERY:(SELECT Col1024 FROM (SELECT Tbl1003.
"OrderID" Col1023,Tbl1003."CustomerID" Col1024,Tbl1003."OrderYear" Col1027
FROM "northwind"."dbo"."Orders1997" Tbl1003) Qry1031 WHERE Col1023=(1000)))
|
As you can see, given that we've properly matched up the query's filter criteria with the partitioned view's primary key, the optimizer correctly focuses the search on just one of the partitions. Since that partition resides on a linked server, the optimizer adds a Remote Query step to the plan and sends the query to the remote server. Note that the WHERE clause of the remote query (bolded) does not include the partition column even though the original query does. This is because it isn't needed. Once the correct partition has been identified, the partition column itself isn't needed to locate the data in the remote table. By virtue of the CHECK constraint, the optimizer knows that Orders1997 contains data for only one partition, 1997.
18.224.62.105