Chapter 1: Metadata Collection

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 1

Metadata Collection

The first Integration Services design pattern we will cover is metadata collection. What do we mean by “metadata collection”? Good question. This chapter could also be called “Using SSIS to Save Time and Become an Awesome DBA.” Many DBAs spend a large portion of time on monitoring activities such as verifying backups, alerting on scheduled job failures, creating schema snapshots (“just in case”), examining space utilization, and logging database growth over time, to name just a very few. Most RDBMS systems provide metadata to help DBAs monitor their systems. If you’ve been a DBA for a few years, you may even have a “tool bag” of scripts that you use to interrogate metadata. Running these scripts manually is easy when you have just one or two servers; however, this can quickly become unwieldly and consume a large portion of your time as your enterprise grows and as the number of database servers increases.

This chapter examines how to use Integration Services and the metadata that exists within SQL Server to automate some of these routine tasks.

Introducing SQL Server Data Tools

One of the major features of SQL Server 2012 is the introduction of SQL Server Data Tools (SSDT). SSDT replaces Business Intelligence Development Studio (BIDS) and leverages the maturity of the Visual Studio product to provide a unified development platform for SQL Server, Business Intelligence (BI), and .NET applications. This book is written using SSDT, although the appearance of the Integration Services designer interface is largely the same as BIDS 2008. SSDT provides backward compatibility for Integration Services 2008 packages via the SSIS Package Upgrade Wizard.

Tip Don’t have SQL Server Data Tools installed? SSDT is a free component of the SQL Server platform and is available to all SQL Server users. You can install SSDT from your SQL Server installation materials under the “Feature Selection” menu.

A Peek at the Final Product

Let’s discuss the Integration Services package we will be creating in this chapter.

In SQL Server, we will do the following:

Create a database to act as our central repository for database monitoring.
Create a table to store a list of SQL Server instances that we wish to monitor.
Create a table for each of the data elements we wish to monitor (unused indexes and database growth).

In Integration Services, we will do the following:

Create a new Integration Services package.
Retrieve a list of SQL Server instances and store the list in a variable.
Create an OLE DB connection with a dynamically populated server name.
Iterate through each database and
a.   Retrieve current database and log file sizes for historical monitoring.

b.   Retrieve a list of index candidates for potential redesign or dropping.

c.   Update the Last Monitored value for each SQL Server instance.

This is a very flexible model that can easily be expanded to include many more monitoring tasks. A screenshot of the completed package is displayed in Figure 1-1.

Figure 1-1. The MetadataCollection package

If this is not your first Integration Services package, maybe you’ve noticed that this package is missing a few best practices, such as error handling. In the interest of clarity, the package we create will focus only on core design patterns; however, we will call out best practices when applicable.

Also, please note that the T-SQL examples will only work with SQL Server 2005 or later.

SQL Server Metadata

Although metadata can be collected from any RDBMS that provides an interface for accessing it, this chapter uses SQL Server as its metadata source. The focus of this chapter is not on the actual metadata, but rather the pattern of metadata collection. Still, it is useful for you to have a basic understanding of the type of metadata that is available.

SQL Server exposes a wealth of information through catalog views, system functions, dynamic management views (DMVs), and dynamic management functions (DMFs). Let’s briefly examine some of the metadata we will be using in this chapter.

Tip SQL Server Books Online is a great resource for learning more about the types of metadata available in SQL Server. Try searching for “metadata functions,” “catalog views,” and “DMVs” for more information.

sys.dm_os_performance_counters

The sys.dm_os_performance_counters DMV returns server performance counters on areas including memory, wait stats, and transactions. This DMV is useful for reporting file sizes, page life expectancy, page reads and writes per second, and transactions per second, to name but a few.

sys.dm_db_index_usage_stats

The sys.dm_db_index_usage_stats DMV contains information on index utilization. Specifically, a counter is incremented every time an index has a seek, scan, lookup, or update performed. These counters are reinitialized whenever the SQL Server service is started. If you do not see a row in this DMV for a particular index, it means that a seek, scan, lookup, or update has not yet been performed since the last server reboot.

sys.dm_os_sys_info

The sys.dm_os_sys_info DMV contains information about server resources. Perhaps the most frequently used piece of information in this DMV is the sqlserver_start_time column, which tells you the last time the SQL Server service was started.

sys.tables

The sys.tables catalog view contains information about every table that exists within the database.

sys.indexes

The sys.indexes catalog view contains information about every index in the database. This includes information such as whether an index is clustered or nonclustered and whether the index is unique or nonunique.

sys.partitions

The sys.partitions catalog view gives visibility into the partitioning structure of an index. When an index has more than one partition, the data in the index is split into multiple physical structures that can be accessed using the single logical name. This technique is especially useful for dealing with large tables, such as a transaction history table. If a table is not partitioned, the table will still have a single row in sys.partitions.

sys.allocation_units

The sys.allocation_units catalog view contains information about the number of pages and rows that exist for an object. This information can be joined to the sys.partitions catalog view by joining the container_id to the partition_id.

Setting Up the Central Repository

Before we can begin development on our Integration Services package, we need to set up some prerequisites in SQL Server. First and foremost, we need to create a database that will act as our central data repository. This is where our list of SQL Server instances will reside and where we will store the metadata we retrieve for each SQL Server instance. Many enterprises also find it convenient to store all error and package logging to this same central database. This is especially beneficial in environments where there are numerous DBAs, developers, and servers, as it makes it easy for everyone to know where to look for information. The T-SQL code in Listing 1-1 creates the database we will use throughout the rest of this chapter.

Listing 1-1. Example of T-SQL Code to Create a SQL Server Database

USE [master];
GO

CREATE DATABASE [dbaCentralLogging]
    ON PRIMARY
    (
       NAME = N'dbaCentralLogging'
      ,FILENAME = N'C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVER
MSSQLDATAdbaCentralLogging.mdf'
     , SIZE = 1024MB
     , MAXSIZE = UNLIMITED
     , FILEGROWTH = 1024MB
    )
    LOG ON
    (
       NAME = N'dbaCentralLogging_log'
     , FILENAME = N'C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVER
MSSQLDATAdbaCentralLogging_log.ldf'
     , SIZE = 256MB
     , MAXSIZE = UNLIMITED
     , FILEGROWTH = 256MB
    );
GO

Please note that your file directory may differ from the one in the preceding example.

This code can be executed from SQL Server Management Studio (SSMS), as demonstrated in Figure 1-2, or from your favorite query tool.

Figure 1-2. SQL Server Management Studio 2012

Next, we need a list of SQL Server instances that need to be monitored. The easiest way to accomplish this is to store a list of database instance names in a file or table. We will use the latter method. Using the code in Listing 1-2, create that table now.

Listing 1-2. Example of T-SQL Code to Create a Table for Monitoring SQL Server Instances

USE dbaCentralLogging;
GO

CREATE TABLE dbo.dba_monitor_SQLServerInstances
(
SQLServerInstance	NVARCHAR(128)
LastMonitored		SMALLDATETIME		NULL

        CONSTRAINT PK_dba_monitor_SQLServerInstances
                PRIMARY KEY CLUSTERED(SQLServerInstance)
);

You will then need to populate the table with the list of SQL Server instances you wish to monitor. The code in Listing 1-3 will walk you through how to do this, although you will need to use real SQL Server instances.

Listing 1-3. Example of T-SQL Code to Insert Data into the dba_monitor_SQLServerInstances Table

INSERT INTO dbo.dba_monitor_SQLServerInstances
(
          SQLServerInstance
)
SELECT @@SERVERNAME-- The name of the server that hosts the central repository
UNION ALL
SELECT 'YourSQLServerInstanceHere'-- Example of a SQL Server instance
UNION ALL
SELECT 'YourSQLServerInstanceHere';-- Example of a server with multiple instances

We still need to create two tables to store the metadata we collect, but we will create these as we get to the appropriate section in the package. Next, we will create our Integration Services package.

The Iterative Framework

In this section, we lay the foundation for our iterative framework. In other words, we will create a repeatable pattern for populating a variable with a list of SQL Server instances, then iterating through the list and performing an action on each server. Let’s do this now.

First, open SSDT. Create a new project by navigating to FileNewProject. Click Business Intelligence under Installed Templates, and then click Integration Services Project in the Installed Templates window. Name the project Meta data Collection, as illustrated in Figure 1-3.

Figure 1-3. New integration services project

Please note that your default Location will most likely be different from the directory pictured in Figure 1-3.

We now need to create two variables. The first variable will be used to store the list of SQL Server instances we retrieve. The second variable will store a single instance name as we iterate through our list.

To access the variable menu, select Variables under the SSIS menu (Figure 1-4); you can also access the Variables menu by right-clicking the designer surface.

Figure 1-4. Opening the Variables menu

Add the following variables by clicking the Add Variable icon on the far left of the Variables menu, as illustrated in Figure 1-5:

Figure 1-5. Package-scoped variables

SQLServerList – Object
SQLServerInstanceName – String

Now that we have a place to store our list of instances, we’re ready to retrieve them. Drag a new Execute SQL Task from the SSIS Toolbox onto the designer surface. Rename the task Retrieve SQL Server Instances, and then double-click it to open the Execute SQL Task Editor. Click the drop-down under Connection, and then select < New connection…>, as seen in Figure 1-6.

Figure 1-6. The Execute SQL Task Editor

In the Configure OLE DB Connection Manager menu, click New. In the Server Name field, enter the database server where you created the database in Listing 1-1. Regardless of whether you are using Windows or SQL Server authentication, make sure that the account has sufficient permissions to each of the instances in our dba_monitor_SQLServerInstances table. Under “Select or enter a database name,” select dbaCentralLogging from the drop-down menu, as illustrated in Figure 1-7. Click OK to return to the Execute SQL Task Editor.

Figure 1-7. The Connection Manager

Note Permissions requirements vary depending on the type of metadata you wish to retrieve. For more information on the permissions necessary to access a specific object, please refer to the object page within SQL Server Books Online.

We now need to write the SQL statement that will retrieve the list of SQL Server instances. Click the […] icon to the right of the SQLStatement field, and then enter the T-SQL code from Listing 1-4.

Listing 1-4. T-SQL Statement to Retrieve SQL Server Instances

SELECT SQLServerInstance FROM dbo.dba_monitor_SQLServerInstances;

Because we are retrieving an array of values, select Full result set from the ResultSet drop-down. Your Execute SQL Task Editor should now resemble Figure 1-8; however, your Connection values will likely be different.

Figure 1-8. The Connection Manager

We’re almost done configuring the Connection Manager. All we have left is to map our result set to our variable. Select Result Set on the left side of the Execute SQL Task Editor, and then click Add. Because we are using a full result set, we must replace the Result Name with 0. We now need to tell Integration Services which variable to use. Select User::SQLServerList from the drop-down under Variable Name, as illustrated in Figure 1-9. Click OK.

Figure 1-9. Editing the result set

Our Execute SQL Task is now complete. Next, we need to iterate through each server to retrieve the metadata we plan to monitor. This process will be encapsulated within a Foreach Loop Container, which will shred the list of SQL Server instances stored in the SQLServerList variable.

Add a Foreach Loop Container to the Control Flow and rename it Foreach SQL Server Instance. Connect it to the Execute SQL Task with a Success Precedence Constraint—in other words, drag the green arrow from the Execute SQL Task to the Foreach Loop Container, as seen in Figure 1-10.

Figure 1-10. Connecting the Execute SQL Task to the Foreach Loop Container

Double-click the Foreach Loop Container to edit its properties. Click the Collection page, and then select Foreach ADO Enumerator in the Enumerator field. Under “ADO object source variable,” select User::SQLServerList; leave “Enumeration mode” set to Rows in the first table. Your Collection properties should match those in Figure 1-11.

Figure 1-11. The Foreach Loop Editor

On the Variable Mappings page, map the SQLServerInstanceName variable to Index 0, as demonstrated in Figure 1-12.

Figure 1-12. Variable mappings

Click the OK button to close the Foreach Loop Container Editor.

Let’s review what we’ve done so far. We now have a variable, SQLServerList, which contains a list of all the SQL Server instances we inserted into the dba_monitor_SQLServerInstances table. The Foreach Loop Container then shreds this variable, walking through each value—each SQL Server instance name, in this case—one at a time. At each pass, it pushes the value of one of those SQL Server instance names into another variable, SQLServerInstanceName.

Before we proceed, we need to create the connection that we will use to dynamically connect to each server. We can accomplish this through the use of property expressions. Let’s walk through how to do this now.

Right-click in the Connection Managers window and select New OLE DB Connection.Create a new connection using the same server and security properties we used previously (Figure 1-7), but select master as the database this time. The database server does not really matter as long as you have sufficient permissions because whatever value we enter will be overwritten by our SQLServerInstanceName variable. The database value does matter, however, because the database we select must exist on every server. Since master is a system database, it is a natural choice.

Click OK to close the Connection Manager Properties window. But we’re not done with this connection just yet. Right-click the newly created connection and select Properties. Change the Name property to DynamicSQLServerInstance, and then click the […] icon in the Expressions field. This will bring up the Property Expressions Editor. Select the Property value we wish to dynamically populate—ServerName, in this case—and enter @[User::SQLServerInstanceName] in the Expression field, as demonstrated in Figure 1-13. Optionally, you can also click the […] icon in the Expression field to open the Expression Builder, which is helpful if you are not very familiar with Expression syntax.

Figure 1-13. Property Expressions Editor

The properties of your connection should now resemble those shown in Figure 1-14.

Figure 1-14. Dynamic connection properties

At this point, we now have a reusable framework for iterating through a list of SQL Server instances and doing something on each server. This in and of itself is a very valuable design pattern. However, because this is a chapter on metadata collection, I would be remiss if I did not actually demonstrate collecting and storing metadata. The next section will walk you through setting up two useful metadata extracts.

Metadata Collection

We’re now ready to retrieve metadata from our list of servers. But what should we collect? An incredible wealth of information is available for retrieval, including security information, usage data, table schema snapshots, failed job details, fragmentation levels, and performance counters, to name just a few. For this first example, let’s keep it simple and retrieve current database and log file size. This information is useful for historical database growth and capacity planning.

To accomplish this, we will create Data Flows within our Foreach Loop Container to retrieve the metadata from each server and store it in our dbaCentralLogging database. The Data Flow task is arguably the most frequently used task in Integration Services. It allows you to easily move data between servers and, if necessary, perform data conversions or cleansing.

Drag a Data Flow task from the SSIS Toolbox into the Foreach Loop Container and rename it Database Size. Double-clicking the Data Flow task will open the Data Flow Designer tab. Notice that the objects available within the Toolbox change once you are inside the Data Flow Designer. Drag the OLE DB Source icon into the Designer and rename it Dynamic SQL Source. Double-click it to edit its properties.

Select DynamicSQLServerInstance in the OLE DB Connection Manager drop-down. Change the Data Access Mode to SQL Command, and then copy the code from Listing 1-5 into the SQL Command Text box.

Listing 1-5. Example of T-SQL to Retrieve Current Data and Log File Sizes for All Databases on the Server

SELECT GETDATE()	AS [captureDate]
    , @@SERVERNAME	AS [serverName]
    , instance_name	AS [databaseName]
    , SUM(
        CASE
            WHEN counter_name = 'Data File(s) Size (KB)'
            THEN cntr_value
        END
      )                AS 'dataSizeInKB'
    , SUM(
        CASE
            WHEN counter_name = 'Log File(s) Size (KB)'
            THEN cntr_value
        END
      ) AS 'logSizeInKB'
FROM sys.dm_os_performance_counters
WHERE counter_nameIN ('Data File(s) Size (KB)'
                , 'Log File(s) Size (KB)')
     /* optional: remove _Total to avoid accidentially
        double-counting in queries */
     AND instance_name<>'_Total'
GROUPBYinstance_name;

This query will produce results similar to the following.

captureDate             serverName  databaseName                dataSizeInKB    logSizeInKB
-----------------------	---------- --------------------------	------------	-----------
2012-04-29 19:52:21.543	LOCALHOST   _Total                       1320896	274288
2012-04-29 19:52:21.543	LOCALHOST   AdventureWorks2012           193536         496
2012-04-29 19:52:21.543	LOCALHOST   dbaCentralLogging            1048576        262136
2012-04-29 19:52:21.543	LOCALHOST   master                       4096           760
2012-04-29 19:52:21.543	LOCALHOST   model                        2112           760
2012-04-29 19:52:21.543	LOCALHOST   msdb                         14080          760
2012-04-29 19:52:21.543	LOCALHOST   mssqlsystemresource          40960          504
2012-04-29 19:52:21.543	LOCALHOST   ReportServer$SQL2012         5184           7032
2012-04-29 19:52:21.543	LOCALHOST   ReportServer$SQL2012TempDB   4160           1080
2012-04-29 19:52:21.543	LOCALHOST   tempdb                       8192           760

(10 row(s) affected)

Your OLE DB Source Editor should now resemble the Editor in Figure 1-15. Click Parse Query to ensure the SQL syntax is correct, and then click Preview at the bottom of the Editor to see a sample of the results. Click OK to exit the OLE DB Source Editor.

Figure 1-15. OLE DB Source Editor

Let’s take a moment to discuss this code. We are using the sys.dm_os_performance_counters DMV to retrieve data file and log file sizes. This DMV stores data and log sizes in a separate row for each database, so we are pivoting the data to return one row for each database, with the file size and log size in separate columns. As a reminder, DMVs were introduced in SQL Server 2005, so this example will only work in SQL Server 2005 and newer editions.

It is generally a best practice to create stored procedures for these types of administrative queries and to deploy them to each server, typically into a database like dbaToolBox. This introduces some maintenance overhead, but benefits of stored procedures—such as visibility into dependencies, usage, performance tuning, and troubleshooting—typically outweigh the overhead. Also, it allows a DBA or developer to manually execute these same queries on each server without having to search for the code within an Integration Services package. However, in the interests of simplicity, we will just input the code directly into our Data Flow task.

Tip The sys.dm_os_performance_counters DMV is very useful for database monitoring and contains much more information than just data and log file sizes. You can easily modify the preceding code to include additional performance counters. However, you should be aware that there are several types of cntr_type values (such as Value/Base, Per Second, and Point-In-Time), and the preceding code only works for the Point-In-Time counter type (cntr_type = 65792). Refer to SQL Server Books Online for more information on the types of information available in this DMV and how to work with each counter type.

Now that we know what our query footprint will look like, we need to create a table to store the results in. From within SSMS, execute the T-SQL statement in Listing 1-6 within the dbaCentralLogging database.

Listing 1-6. Example of T-SQL Code to Create a Table to Store Data and Log File Size Information

USE dbaCentralLogging;
GO

CREATE TABLE dbo.dba_monitor_databaseGrowth
(
    log_id INT IDENTITY(1,1)
    ,captureDate DATETIME
    ,serverName NVARCHAR(128)
    ,databaseName SYSNAME
    ,fileSizeInKB BIGINT
    ,logSizeInKB BIGINT

    CONSTRAINT PK_dba_monitor_databaseGrowth
        PRIMARY KEY NONCLUSTERED(log_id)
);
CREATE CLUSTERED INDEX CIX_dba_monitor_databaseGrowth
    ON dbo.dba_monitor_databaseGrowth(captureDate,serverName,databaseName);

We can now return to our Integration Services package. We do not need to perform any data cleansing or data transformations in this Data Flow task, so we’ll proceed straight to storing our results. Select the OLE DB Destination item from the Toolbox and rename it Central Logging Destination. Connect it to the OLE DB Source by dragging the blue data flow arrow from the source to the destination. Double-clicking the OLE DB Destination brings up another Editor. This time, select your dbaCentralLogging connection from the OLE DB Connection Manager drop-down. Leave Table or view – fast load selected in the Data Access Mode drop-down. In the “Name of the table or the view” drop-down, select [dbo].[dba_monitor_databaseGrowth], as seen in Figure 1-16.

Figure 1-16. Editing the OLE DB destination connection manager

When you’re done with the Connection Manager, click the Mappings menu. You’ll notice that Integration Services has taken the liberty to perform an initial mapping based on column names. While this is a nice time-saving feature, be wary in environments where the same column name is used for multiple data elements. Because the log_id column is an identity value that is populated during data insertion, we will ignore it in our mappings. Confirm that your mappings resemble those shown in Figure 1-17, and then click OK to return to the Data Flow designer.

Figure 1-17. Editing the OLE DB destination mappings

Our first Data Flow is complete, as seen in Figure 1-18.

Figure 1-18. The completed Data Flow task

We are now ready to create our second Data Flow. From the Control Flow tab, copy and paste the existing Data Flow into the Foreach Loop Container. Drag the green arrow—the Success Precedence Constraint—from the Database Size Data Flow to our new Data Flow. Rename the new Data Flow as Unused Indexes, and then double-click it to return to the Data Flow designer.

Double-click the Dynamic SQL Source OLE DB Source to edit its properties. We need to change the SQL command to use the code in Listing 1-7.

Listing 1-7. Example of T-SQL Query to Retrieve Unused Indexes

/* Create a variable to hold a list of indexes */
DECLARE@Indexes     TABLE
( serverName        NVARCHAR(128)
, schemaName        SYSNAME
, schemaID          INT
, databaseName      SYSNAME
, databaseID        INT
, tableName         SYSNAME
, objectID          INT
, indexName         SYSNAME
, indexID           INT
, indexType         NVARCHAR(60)
, isPrimaryKey      BIT
, isUnique          BIT
, isFiltered        BIT
, isPartitioned     BIT
, numberOfRows      BIGINT
, totalPages        BIGINT);
/* Iterate through all databases */
INSERT INTO@Indexes(serverName,schemaName,schemaID,databaseName,databaseID,tableName,objectID,indexName,indexID,indexType,isUnique,isPrimaryKey,isFiltered,isPartitioned,numberOfRows,totalPages)
EXECUTE sys.sp_MSforeachdb
' USE ?;
SELECT @@SERVERNAME
    , SCHEMA_NAME(t.schema_id)
    , t.schema_id
    , DB_NAME()
    , DB_ID()
    , t.name
    , t.object_id
    , i.name
    , i.index_id
    , i.type_desc
    , i.is_primary_key
    , i.is_unique
    , i.has_filter
    , CASE WHEN COUNT(p.partition_id) > 1 THEN 1 ELSE 0 END
    , SUM(p.rows)
    , SUM(au.total_pages)
FROM sys.tables AS t WITH (NOLOCK)
JOIN sys.indexes AS i WITH (NOLOCK)
ON i.object_id = t.object_id
JOIN sys.partitions AS p WITH (NOLOCK)
ON p.object_id = i.object_id
AND p.index_id = i.index_id
JOIN sys.allocation_units AS au WITH (NOLOCK)
ON au.container_id = p.partition_id
WHERE i.index_id <> 0 /* exclude heaps */
GROUP BY SCHEMA_NAME(t.schema_id)
, t.schema_id
, t.name
, t.object_id
, i.name
, i.index_id
, i.type_desc
, i.has_filter
, i.is_unique
, i.is_primary_key;';
/* Retrieve index stats for return to our central repository */
SELECTGETDATE() AS [captureDate]
    , i.serverName
 , i.schemaName
 , i.databaseName
 , i.tableName
 , i.indexName
 , i.indexType
 , i.isFiltered
 , i.isPartitioned
 , i.numberOfRows
 , ddius.user_seeks AS [userSeeksSinceReboot]
 , ddius.user_scans AS [userScansSinceReboot]
 , ddius.user_lookups AS [userLookupsSinceReboot]
 , ddius.user_updates AS [userUpdatesSinceReboot]
 , (i.totalPages * 8) / 1024 AS [indexSizeInMB]/* pages are 8KB */
 , dosi.sqlserver_start_time AS [lastReboot]
FROM @Indexes AS i
JOIN sys.dm_db_index_usage_stats AS ddius
 ON i.databaseID = ddius.database_id
 AND i.objectID = ddius.object_id
 AND i.indexID = ddius.index_id
CROSS APPLY sys.dm_os_sys_info AS dosi
WHERE /* exclude system databases */
 i.databaseName NOT IN ('master','msdb','tempdb','model')
 /* exclude unique indexes; assume they are serving a business function */
 AND i.isUnique = 0
 /* exclude primary keys; assume they are serving a business function */
 AND i.isPrimaryKey  = 0
 /* no seeks have been performed since the last server reboot */
 AND user_seeks = 0;

Tip The T-SQL in Listing 1-7 is just a starting point. This query can be easily modified to return information such as which clustered indexes may warrant redesign, which tables have the most updates, and which tables are the most frequently queried.

An example of the output follows.

captureDate        	serverName	schemaName	databaseName            tableName
-----------------------	-----------	-----------	------------------	------------------
2012-04-29 19:37:36.927	LOCALHOST	Production	AdventureWorks2012	TransactionHistory
2012-04-29 19:37:36.927	LOCALHOST	Production	AdventureWorks2012	TransactionHistory
2012-04-29 19:37:36.927	LOCALHOST	Sales           AdventureWorks2012	SalesOrderDetail

indexName                                 indexType	isFiltered	isPartitioned	numberOfRows
--------------------------------------	------------	----------	-------------	------------
IX_TransactionHistory_ProductID          NONCLUSTERED	0                0              1134431
IX_TransactionHistory_ReferenceOrderID   NONCLUSTERED	0                0              1134431
IX_SalesOrderDetail_ProductID            NONCLUSTERED	0                1              1213178

userSeeksSinceReboot	userScansSinceReboot	userLookupsSinceReboot	userUpdatesSinceReboot
--------------------	--------------------	----------------------	----------------------
0                        0                       0                       98
0                        8                       0                       98
0                        2                       0                       124

indexSizeInMB	lastReboot
-------------	------------
9 2012-04-28	19:15:28.837
21 2012-04-28	19:15:28.837
28 2012-04-28	19:15:28.837

As you can see, this query is a bit more complex than the last one. Let’s discuss what we’re doing. Developers are usually very good at identifying performance issues. Why? When a query is slow, someone is usually complaining about it! It’s not uncommon for the fix to involve the creation of an index, which can reduce IO and improve query duration. Over time, however, the query may change—resulting in a different indexing being used by the optimizer—or perhaps the query is no longer needed. Unlike the more attention-getting performance issue, these types of changes tend to creep up silently over time. Eventually that same index, which was so beneficial when it was being used, is now consuming unnecessary resources—namely, it slows down inserts, consumes precious disk space, and inflates backups.

One way to stay on top of unused indexes is to search the sys.dm_db_index_usage_stats DMV. This DMV keeps track of index usage information, including how many times an index has been seeked or scanned and how many updates have been performed. This information is refreshed after every reboot, so please note that a server that has been restarted recently may show an inaccurately high number of “unused” indexes. Also, this information is merely a starting point for further research into whether an index should be dropped or redesigned; many organizations may have indexes that are not called frequently but are necessary for important monthly or annual reports.

One other important thing to note is that this script makes use of the undocumented sp_MSforeachdb stored procedure. This stored procedure iterates through every database, executing whatever command is passed to it. For numerous reasons—not the least of which is the fact that it is an undocumented, and therefore unsupported, stored procedure that may occasionally skip databases—I recommend using Aaron Bertrand’s sp_foreachdb stored procedure instead. However, once more in the interests of simplicity, we will use the sp_MSforeachdb procedure in our example.

Tip Aaron Bertrand’s sp_foreachdb stored procedure can be found at www.mssqltips.com/sqlservertip/2201/making-a-more-reliable-and-flexible-spmsforeachdb/ or by searching for “MSSQLTips sp_foreachdb” in your favorite search engine.

Now that we understand a little more about the metadata we are retrieving, let’s return to our package. Click Parse Query to ensure you do not have any errors in your syntax, and then click Preview to see a sample of the results. Click the Columns page to ensure that the column list has been successfully updated. Then, click OK to return to the Data Flow designer.

You should now see an error in your Data Flow, as illustrated in Figure 1-19. This is expected because we’ve changed the columns that our data source is providing, but our destination still expects the old column list.

Figure 1-19. The completed Data Flow task

Before we can fix this error, we need to return to SSMS to create the table we will use to store our unused index data, dba_monitor_unusedIndexes. Do so now, using the code in Listing 1-8.

Listing 1-8. T-SQL Code to Create the dba_monitor_unusedIndexes Table

USE dbaCentralLogging;
GO
CREATE TABLE dbo.dba_monitor_unusedIndexes
(log_id INT IDENTITY(1,1)
,captureDate DATETIME
,serverName NVARCHAR(128)
,schemaName SYSNAME
,databaseName SYSNAME
,tableName SYSNAME
,indexName SYSNAME
,indexType NVARCHAR(60)
,isFiltered BIT
,isPartitioned BIT
,numberOfRows BIGINT
,userSeeksSinceReboot BIGINT
,userScansSinceReboot BIGINT
,userLookupsSinceReboot BIGINT
,userUpdatesSinceReboot BIGINT
,indexSizeInMB BIGINT
,lastReboot DATETIME
    CONSTRAINT PK_dba_monitor_unusedIndexes
        PRIMARY KEY NONCLUSTERED(log_id)
);
CREATE CLUSTERED INDEX CIX_dba_monitor_unusedIndexes
    ON dbo.dba_monitor_unusedIndexes(captureDate);

Returning to Integration Services, double-click the Central Logging Database OLE DB Destination to edit its properties. Change the “Name of the table or the view” value to [dbo].[dba_monitor_unusedIndexes], and then click the Mappings page. Because our source and destination are using the same column names, we can easily update the mappings by right-clicking in the space between Available Input Columns and Available Destination Columns and then selecting Map Items by Matching Names. Figure 1-20 illustrates this option.

Figure 1-20. The Map Items by Matching Names option in the Mappings page

Once more, the log_idcolumn will not map to anything because it is an identity column. Click OK to return to the Data Flow designer, and then click the Control Flow tab.

See how quickly that second Data Flow went? You can continue to easily add more metadata collection tasks using this same method. All that we have left to do is to update our Last Monitored column in the dba_monitor_SQLServerInstances table.

Tip It may be tempting to create a one-size-fits-all package. However, it is generally a better idea to separate metadata collections into separate packages organized by frequency requirements. For example, the metadata we have collected in this chapter only requires periodic samples, such as daily or weekly collection. Metadata that requires more frequent collection, such as an hourly check for failed SQL Agent jobs, should be stored in a separate package.

Add an Execute SQL Task to our Foreach Loop Container and rename it Update LastMonitored. Connect the Unused Indexes Data Flow to the Update Last Monitored Execute SQL Task. Double-click the Execute SQL Task to edit its properties. Select the dbaCentralLogging connection in the Connection drop-down, and then enter the code from Listing 1-9 in the SQLStatement field.

Listing 1-9. T-SQL Code to Update the LastMonitored Value in dba_monitor_SQLServerInstances

UPDATEdbo.dba_monitor_SQLServerInstances
SETLastMonitored = GETDATE()
WHERESQLServerInstance = ?;

The question mark (?) tells the Execute SQL Task that we are passing a parameter to this SQL statement. Now we just need to map our variable to our parameter. To do this, click the Parameter Mapping page and click Add. Edit the properties as follows:

Variable Name = User::SQLServerInstanceName
Direction = Input
Data Type = NVARCHAR
Parameter Name = 0
Parameter Size = 128

Confirm that your mappings match those shown in Figure 1-21, and then click OK.

Figure 1-21. Parameter mapping in the Execute SQL Task Editor

We are now ready to execute our package! To do this, you can select DebugStart Debugging from the menu, click the green Run icon in the toolbar, or press F5. Your package should resemble Figure 1-22 upon successful execution.

Figure 1-22. Successful execution of the MetadataCollection package

Congratulations! You have now collected metadata from a list of servers and stored the results in your dbaCentralLogging database.

This concludes our walk-through on collecting metadata via SSIS. However, there are many more tasks that a diligent developer or DBA may want to consider. First, as we discussed early on in this chapter, this package does not contain any exception handling or logging, which is outside the scope of this chapter. However, a best practice is to include some sort of exception handling and logging on every package. Second, we have only tipped the proverbial iceberg when it comes to collecting metadata. There is much more information to consider, such as security audits, error logs, SQL Server Agent job statuses, and much, much more. If you’re not sure where to start, consider ranking metadata tasks by criticality and working in descending order of importance. As a last piece of homework, you may want to consider setting up monitors to alert when unfavorable conditions are met (for example, a SQL Server agent job has failed or available space is getting low).

Summary

In this chapter, we discussed the importance of metadata. We explored some of the metadata that exists within SQL Server and provided two examples of valuable T-SQL metadata queries. We identified a very flexible and reusable pattern for collecting database metadata in an enterprise environment. Lastly, we created an Integration Services package that does the following:

Identify a list of SQL Server instances on our network.
Iterate through the list of instances and
a.   Connect to each SQL Server Instance.

b.   Retrieve one or more pieces of information.

c.   Log the results to our centralized repository.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 1: Metadata Collection

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 1: Metadata Collection