13.6. Best practice considerations: index design and maintenance

Poor index design and maintenance strategies are a significant contributor to reduced database performance. Classic index-related problems include too many indexes, high fragmentation levels, and missing indexes. Fortunately, SQL Server includes a number of tools and processes for improving index design and maintenance.

  • With the possible exception of small temporary tables, or those used for insert activity only, all tables should be created with a physical order, achieved through the use of a clustered index. Clustered tables enable fragmentation to be controlled without user outage and enable fast lookups when querying on the clustered key(s).

  • Given that the clustered index key is included in each nonclustered index as the row locator, from a size and performance perspective, wide-clustered index keys should be avoided in favor of narrower ones. The ideal clustered index key is narrow, unique, and holds no intrinsic business value, therefore avoiding updates and the subsequent need to maintain nonclustered index row locators. For these reasons, identity-based columns are often used for clustered surrogate primary keys.

  • When creating tables, creating the clustered index before nonclustered indexes will prevent the need for nonclustered index rebuilds.

  • A common database design practice is to use globally unique identifier (GUID) columns as primary keys, which by default will also be the table's clustered index unless specified otherwise. Not only are GUIDs wide (16 bytes), they're randomly generated. Given such tables are clustered on the GUID column, newly inserted rows will be randomly positioned throughout the table, leading to page splits and subsequent fragmentation. This is a particular concern for tables with a high volume of data inserts. SQL Server 2005 introduced the NewSequentialID() function, which partially offsets this problem by generating GUIDs in a sequential manner. Removing the "randomness" from the GUID values helps in reducing both page splits and fragmentation.

  • Columns frequently used for range scans and sort operations are good candidates for a clustered index, for example, an order date column used to return orders within a date range. A clustered index on this column may avoid the need for both scan and sort operations.

  • Look for opportunities to cover queries using either additional nonclustered keys or included columns. Covered indexes avoid the need for key/RID lookup operations, improving performance substantially and increasing the likelihood of a nonclustered index being used.

  • Filtered indexes are excellent for creating small and fast indexes on tables with subsets of data accessed using simple predicates commonly supplied during searches. They also enable segregated index maintenance, more accurate statistics, and the ability to create unique indexes on columns containing null values (but whose not-null values are unique). However, if most of a table's rows would be included in the filtered index, a full table index is most likely a better option.

  • Indexed views achieve similar benefits to filtered indexes but enable more complex predicates and can span more than one table. They are ideal when using aggregate functions, but because of the cost of maintaining the data in line with the base tables, they are best used when the base table(s) are infrequently modified. The maintenance costs of maintaining indexed views on heavily modified tables may outweigh the benefits and should therefore be carefully considered and measured before production implementation.

  • Loading up a database with indexes in the hope that performance will eventually improve is a flawed and dangerous approach. A measured approach using views and functions such as sys.dm_db_index_usage_stats and the sys.dm_db_missing_index group offers a much better qualitative approach as part of an ongoing, proactive maintenance regime.

  • Consider all foreign keys as index candidates. Table lookups based on a foreign key column are very common, as are performance problems stemming from the lack of indexes on these columns.

  • For multicolumn indexes, placing the most selective column first usually results in the best performance, particularly when using like searches. In cases where doing so makes the index unusable, for example, when the column is rarely supplied as a search condition, then this is obviously not appropriate.

  • For very large join operations in specialist situations—for example, a data conversion process—consider the possibility of clustering each table on the join column. Doing so permits a merge join and can result in substantial performance gains.

  • Use the sys.dm_db_index_usage_stats DMV to determine the maintenance cost of an index compared to its usage benefits. This DMV is best used as part of a regular index-analysis maintenance routine. Bear in mind that the values reported by this DMV are applicable only since the last instance restart, so decisions based on its output should take this into consideration.

  • Identify and remove duplicate and/or overlapping indexes, but consider the possibility that they may be referenced in T-SQL code as index hints. Before dropping such indexes, script their definition or use the DISABLE function.

  • Using the SET STATISTICS XML ON command enables the query execution plan to be captured in XML format. This enables plan portability for support and analysis purposes, while also allowing inspection of missing indexes as reported by the query optimizer.

  • Using the sys.dm_db_missing_index DMVs allows retrospective inspection of all missing indexes as reported by the query optimizer since the last instance restart. By using the cumulative statistics in the index_group_stats DMV, missing indexes can be prioritized in order of potential value and considered as candidates for addition. Like the index usage inspection, this can be part of a regular maintenance routine.

  • The Database Engine Tuning Advisor overcomes some of the limitations of the missing index DMVs such as the maximum of 500 stored missing indexes and their inability to suggest clustered/filtered indexes and indexed views. Further, its input can be a profiler trace of actual production activity, captured over the appropriate timeframe.

  • For all index additions/removals, measuring the performance impact in a volume test environment before production implementation is crucial in avoiding unexpected problems.

  • The sys.dm_db_index_physical_stats function can be used to identify an index's fragmentation level and should be used in place of the older, limited functionality offered by DBCC SHOWCONTIG.

  • For operations such as large data loads where indexes are dropped and re-created around the load, disabling indexes rather than dropping them enables the index re-creation step to be much simpler by excluding the need to store the index definition. A disabled index retains its metadata, so a simple rebuild command is sufficient to restore the index.

  • Use the ALTER INDEX REORGANIZE command in place of the older DBCC INDEXDEFRAG command to correct index fragmentation levels of up to 30 percent.

  • Use the ALTER INDEX REBUILD command in place of the older DBCC DBREINDEX command to correct index fragmentation of greater than 30 percent.

  • Avoid unnecessary index rebuilds. Unnecessary index rebuilds deliver limited benefits while increasing transaction log usage and potential user impact. Increased transaction log usage is of prime concern with database-mirroring solutions, particularly when in high-safety (synchronous) mode.

  • If you rebuild indexes only in order to maintain accurate statistics, consider the statistics maintenance commands with the FULLSCAN or Sample 100 options.

  • The ONLINE = ON option of ALTER INDEX REBUILD is available in the Enterprise edition of SQL Server, making it the ideal choice for low user impact index maintenance in 24/7 environments or those with limited maintenance windows.

  • Using CREATE WITH DROP_EXISTING is an ideal alternative to dropping and re-creating clustered indexes with a different structure. Dropping and re-creating a clustered index incurs a double rebuild impact on existing nonclustered indexes, a process avoided by using the DROP_EXISTING command.

  • Avoid the classic mistake of updating statistics following an index rebuild. An index rebuild automatically performs a full statistics update, so performing another one is at best unnecessary, and at worst lowers statistics accuracy by replacing full statistics with a sampled set.

  • A lower fill factor (80-90 percent) may be appropriate for indexes on tables with high amounts of update activity. While a lower fill factor reduces fragmentation and page splits, it also increases index size and page reads, so the performance improvement (or decrease) should be carefully measured in a volume-testing environment before production implementation.

  • The SORT_IN_TEMPDB option may improve index rebuild performance if the tempdb database is located on dedicated disks. If you're using this option, be sure to capacity plan the disk space requirements for tempdb to prevent index rebuilds failing from lack of disk space. Again, the best way to plan this is by using a volume-test environment configured identically to the production environment.

  • Leave the default automatic statistics settings in place. If you're attempting to avoid occasional query execution delay due to automatic statistics operations, consider the asynchronous statistics mode as an alternative to disabling automatic statistic settings.

  • Following a SQL Server upgrade, a full statistics update on all indexes and columns is recommended to enable the enhanced query optimizer in SQL Server 2008 to take full advantage of the most accurate statistics.

  • In cases where combinations of nonindexed columns are frequently supplied as search predicates, for example, ProductType and Color, creating multicolumn statistics on the column combination can improve performance by enabling SQL Server to better estimate the likely row count and therefore choose the most appropriate join type. Doing so avoids the need to supply join hints in an attempt at optimizing performance.

  • In cases where calculations across multiple nonindexed columns are used as a predicate, for example, where price + tax > 1000, consider creating this as a calculated column. This will enable SQL Server to create statistics on the column for more accurate query plans.

  • In cases where indexes exist on columns with monotonically increasing values, for example, IDENTITY columns, and the table receives a large volume of inserts, the most recently inserted rows will fall outside the statistics histogram. In cases where the lack of statistics on the new rows is causing inaccurate query plans, consider manually updating statistics on a more frequent basis than that performed by the automatic statistics process.

Additional information on the best practices covered in this chapter can be found online at http://www.sqlCrunch.com/index.

In the next chapter we'll cover a number of techniques for automating some of the index maintenance tasks we've covered in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.232.152