The Multiphase Data Pump

Much of the real functionality of DTS is encapsulated in its multiphase data pump. It's the engine behind the Transform Data task, the Data Driven Query task, and the Parallel Data Pump task. As you might expect, the main purpose of the pump is to move and transform data between data sources.

The data pump goes through six basic steps or phases during the transformation of data from one source to another. These phases can be exposed as events to which you can attach scripting code in order to customize the behavior of the data pump. Note that the row transformation phase is the only data pump phase to which you can attach code by default. In order to make the other phases visible to your package and scripting code, you must enable multiphase pump display in the DTS Designer by right-clicking the Data Transformation Services node in Enterprise Manager and checking the Show multi-phase pump in DTS Designer option. Once the multiphase pump has been fully exposed in the Designer, you can attach ActiveX script code to phase events to customize the behavior of the pump. As I've said, the data pump goes through six basic phases when transforming data.

1.
Presource phase— this step occurs before any rows are actually read from the source data. This is a good place to create header rows or carry out other preparatory work prior to the start of the row transformation process.

2.
Row transform phase— this is the default data pump phase and is the step where each row is read from the source data and optionally transformed.

3.
Post–row transform phase— this step occurs after the row transform phase has completed and is itself composed of three subphases.

  1. On Transform Failure— this phase is executed when an error occurs during row transformation. Note that any errors generated during this phase do not count toward the maximum number of errors specified in the Options page for the transformation. You can use script code to detect errors during this phase and respond accordingly. Based on what happens in this step, one of the two following substeps then executes.

  2. On Insert Success— this phase occurs when a row is successfully inserted into the destination rowset. (Understand that no data is actually written to the destination at this point—the writes are to the destination rowset, a cache that will later be written to the destination.)

  3. On Insert Failure— this phase is executed when the insertion of a transformed row into the destination rowset fails. This error does not count toward the maximum specified in the Options page for the transformation.

4.
Batch complete phase— this phase occurs whenever the number of rows inserted into the destination rowset equals the batch size specified in the Options page for the transformation. It also occurs when all the rows in the source data have been processed and there is at least one row in the destination rowset. When this phase occurs, the pump writes the rows in the destination rowset to the destination table. Depending on whether you've configured the package to use transactions, an error during this phase may result in only some of the rows being written to the destination. This phase executes on completion of a batch, regardless of whether it's successfully written to the destination. Note that it may not execute in some circumstances, such as when the insertion of each source row into the destination rowset fails. In that scenario, there's no batch to write to the destination because the destination rowset is empty.

5.
Postsource phase— this is the corollary to the presource phase and allows you to hook up script code that performs some task after all rows have been transformed. You can still access the destination data at this point, so you could use this phase to write out summary rows or further interact with the destination data in some way.

6.
Pump complete phase— this phase occurs after all processing is complete, just before the pump shuts down. Although you no longer have access to either source or destination data at this point, you still have the full ActiveX scripting environment at your disposal to do things such as write audit or log records, interact with the file system, and so on.

You attach script code to the other phases in a data pump transformation just as you would normally attach it to the row transform phase: You click the Transformations tab, then select the phase to which you want to attach code in the Phases filter combo box and click the New button to set it up. Select the ActiveX script option in the Create New Transformation dialog, then click the Properties button in the Transformation Options dialog in order to add your scripting code. Configure the source and destination columns as appropriate and save the transformation. When you run the package, the code you attached to the specified pump phase will execute as appropriate.

To experiment with this a bit, load the MultiphaseDataPumpExample.DTS package from the CH20 folder on the CD accompanying this book into the DTS Designer. This is a basic package that simply copies a table from the pubs database to the Northwind database. I've hooked ActiveX script code to each of the pump phase events so that you can see how these work. Set your connection properties as appropriate, then run the package. You should see a dialog box displayed for each event. Note that some events occur more than others—for example, you should see close to two dozen Insert Success events—one for each row copied from the source to the destination.

NOTE: All of the example packages mentioned in this chapter use “(local)” when referring to a SQL Server. If the server on which you'll be testing these packages is a named instance, you can create a client configuration alias named “(local)” in order to avoid having to modify the example packages before running them.


If you bring up the options dialog for each of the ActiveX transformations, you'll see a Phases tab that allows you to specify which phase(s) the code is to be associated with. By default, this is the same as the phase selected in the Phase filter combo box when the transformation was first created. You'll note that the last two items in the dialog are reversed in terms of when they execute: the pump complete event actually occurs after the postsource event.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.119.114