In the first section of this book, we looked at patterns focused on the control flow area of SQL Server Integration Services, including metadata, workflow execution, and scripting. The second section focuses on the data flow area of SQL Server Integration Services. This and the following chapters will discuss source, transformation, and logging patterns in the pipeline area of an Integration Services package.
Integration Services supports a wide variety of sources, including SQL Server, Oracle, and SAP. In addition, developers and third-party vendors have the ability to create custom sources for providers not included out-of-the-box. This technology-agnostic approach creates a very flexible system for loading all sorts of data. Even with all of the potential sources, loading data into or out of a SQL Server database is a very common occurrence, since a company that owns Integration Services typically uses all Microsoft products.
This chapter discusses different patterns associated with using SQL Server as a source. Due to the common occurrence of SQL Server databases in shops using Integration Services, we have a defined set of patterns for extracting data from SQL Server. Specifically, we will look at the best way to connect to a SQL Server database, how to choose the data you will use, and how to more easily use the rest of the data flow’s objects. Finally, we will look at a new component in SQL Server Denali that helps jump-start your development when connecting to any source.
When connecting to external data, Integration Services uses a few objects to help make the connection, retrieve the correct data, and start any necessary data manipulations. Every time an Integration Services developer creates a package, the developer will need to select the correct objects and ensure they have all been created. The objects that will need to be set up are as follows:
Let’s take a look at the important factors associated with each of these items. We’ll begin in the next section by looking at connection managers.
Selecting a SQL Server Connection Manager and Provider
Between ADO.NET, ODBC, and OLE DB, there are enough connection managers to make you want to pull your hair out! All of these connection managers will connect to SQL Server, so how do you know when you should use which one? To answer that question, let’s talk about what the connection manager is actually doing and then look at each connection manager type that you can use to connect to SQL Server.
A connection manager is the object that holds connection information for an external source, akin to an application data source or a Reporting Services shared data source. The connection manager provides an abstraction layer between Integration Services and the rest of the components so that information about the external source can be modified in one place to affect all tasks and components. To see all of the connection manager types available, take a look at Figure 4-1.
Figure 4-1. Connection manager types
These are the three connection manager types that you can use to connect to a SQL Server database:
Let’s take a look at each connection manager type individually.
ADO.NET
The ADO.NET connection manager type is used to make a connection through the .NET Framework. Not only can this type be used for SQL Server, but it also provides access to other applications and other databases. The ADO.NET layer quickly retrieves data from the source using a DataReader object in the .NET Framework.
The ADO.NET connection manager for SQL Server is best used as a source when you are using it elsewhere in the package. For example, the Lookup component uses an ADO.NET connection manager to connect, so then you should use it as a source. On the other hand, if a component uses another connection manager type, stick with that connection manager type. Consistency is really the key here. For a sample connection manager property window set to connect to the AdventuresWorks2012 database on the same server, see Figure 4-2.
Figure 4-2. ADO.NET connection manager property screen
ODBC
ODBC is the open database connectivity standard. Its purpose is to allow connections from any application to any database, regardless of the vendor. Often, an organization will use DSNs (data source names) to create an abstraction layer between the application and the connection string the ODBC provider uses. If you have an organization that really wants to use DSNs with SQL Server, ODBC is the option for you. Otherwise, stick with an ADO.NET or OLE DB connection manager.
SSIS did not have an ODBC Source component prior to SQL Server 2012. Instead, you could and still can use the ADO.NET connection manager with a few tweaks. After you create the ADO.NET connection manager, change the provider at the top of the window to Odbc Data Provider, as shown in Figure 4-3.
Figure 4-3. ADO.NET providers
Then add the DSN name or connection string. For our local AdventureWorks2012 database, the connection string will look like Listing 4-1.
Listing 4-1. ODBC Connection String
Driver={SQL Server Native Client 11.0};
Server=localhost;
Database=AdventureWorks2012;
Trusted_Connection=yes
Our completed Connection Manager screen for an ADO.NET connection with an ODBC provider looks like Figure 4-4.
Figure 4-4. ODBC ADO.NET connection manager property screen
If you are using SQL Server 2012 or later, you can use the ODBC Source component. Configuring the ODBC Source is very similar to the ADO.NET Source, except that you use an ODBC connection manager. The ODBC Source provides additional performance benefits over ADO.NET; however, it requires SQL Server Enterprise Edition to run outside of the SQL Server Data Tools environment.
Finally, we move on to what is arguably the most common connection manager used to connect to SQL Server: OLE DB. The OLE DB protocol was written by Microsoft as the next version of the ODBC provider. In addition to SQL Server databases, you can use the OLE DB connection manager to connect to file-based databases or Excel spreadsheets.
OLE DB tends to be my default when I’m connecting to a SQL Server database. If you do not fall into the category of using mostly components that use an ADO.NET connection manager, and you do not fall into the category of having an organization that wants to use a DSN, you will want to use an OLE DB connection manager.
You can fill out the property screen of the OLE DB connection manager as shown in Figure 4-5.
Figure 4-5. OLE DB connection manager property screen
Creating a SQL Server Source Component
Once you’ve picked the correct connection manager and provider, you need to use them in a Source Component for our data pull. Begin by looking at the SSIS Toolbox when you are on the Data Flow tab. If you have not rearranged the SSIS Toolbox, you will see all possible sources under the Other Sources grouping, as seen in Figure 4-6.
Figure 4-6. Other Sources grouping in the SSIS Toolbox
Now it’s time to choose one.
Most of the hard decision-making was already completed when you set up the connection manager. If you use an OLE DB connection manager, then you must use the OLE DB source. If you decided to use an ADO.NET connection manager with either an ADO.NET or ODBC provider, you must use the ADO NET Source.
Once you drag the desired source onto the Data Flow design window, the data flow contains one component, as shown in Figure 4-7. Integration Services lets the developer know that there is an issue with the source by displaying the red circle with the white X in it. In this case, the issue is that you have not yet set any of the source’s properties, starting with the destination table, as shown in the tooltip.
Figure 4-7. Data Flow task with new source
To open the source component, either double-click the component or right-click and select Edit. Inside the source component, you can fill in the Connection Manager property. The first connection manager of the appropriate type will automatically be populated, but selecting the drop-down list arrow will let you select any of the other connection managers that match. You can see the OLE DB Source Editor screen up to this point in Figure 4-8.
Figure 4-8. Initial OLE DB Source Editor screen
Although most of the decision-making work was already done when you created the connection manager, it is important to understand the part the source plays in the Integration Services package. The source is the glue that holds all of the other pieces together and ensures that you have one place to go to for future maintenance issues or changes. Setting up the SQL Server source was an easy step to get you warmed up before you move on to create and optimize the query that the source uses.
Writing a SQL Server Source Component Query
Now that you have walked through the creation of the connection manager and provider and decided which source component to use, you need to set up the metadata for pulling the data. You do this by selecting what type of access you want to make and then adding the query information to the source component. In addition, you will want to review a few patterns when you’re setting up the query and column metadata. Let’s get started.
If you decided to use the ADO.NET source component, either with the ADO.NET or ODBC provider, you have two options to select what data you want to see:
Because the Table or View option is not our recommended option, let’s dig into the SQL Command option a little deeper. You can enter either a direct SQL query that returns a dataset or a stored procedure using the EXEC statement.
Whether you are using a SQL query or executing a stored procedure, you will need to be aware that the ADO.NET source does not allow you to use parameters in your query. If you need to modify the query that gets used, you will need to use an expression. Expressions are only set at the control flow level, so you will need to take a look there to set up your expression. Follow these steps to set a new SQL command at design time:
Figure 4-9. Property Expressions Editor with SQLCommand property
When the package runs, it will now use the expression you just created.
If you’ve selected the OLE DB source component, you have four options for how to retrieve data:
By picking one of these options, you will determine how the data is returned from SQL Server. After you select the type of data retrieval, you’ll want to add the appropriate properties. For example, if you select the Table or View option, you’ll need to select the object that contains the data. If you select the SQL Command option, you’ll need to enter the SQL query or stored procedure execution that returns the data. Once that is set, you can move on to designing the rest of your data flow.
Waste Not, Want Not
As data professionals, we often think that the more data we can get, the better. This isn’t always the best scenario when you are dealing with sources. When you are talking about the amount of data to pull, you will want to follow a different pattern.
No matter which query option you selected, it is important to only ask for the columns that you need in your data load process. Requesting all columns is similar to running a select * from table query against a database. Not only are you asking the database and network to do more work, but you are also asking Integration Services to do more work. All of that unnecessary data will get stored in memory, or even cause paging if there isn’t enough memory, using up space that could be used to grab more data for the important columns and slowing down the overall package execution.
All source components give you the option to pick a subset of columns on the Columns menu. Be sure to make the column reduction in the query itself rather than in the Columns menu to reap the full benefit of a faster package.
Another Integration Services source trap that is easy to fall into is to perform the majority of the data transformations in the source query itself. Because Integration Services developers often have a SQL background, we tend to want to use a familiar tool to accomplish our tasks.
The types of data transformations that can be undertaken in either the source query or the rest of the data flow include merging of datasets, case statements, string concatenation, and more. Remember that SQL Server is very good at set-based actions, whereas Integration Services is very good at computationally expensive tasks that use a lot of memory. Although you should test your individual situation, these are good rules of thumb to follow.
Follow the pattern listed in Table 4-1 when you are deciding where to put your data translation logic.
Table 4-1. Data Translation Locations
Data Translation |
Concern |
Location |
---|---|---|
Merge datasets |
Set-based |
Source component |
Case statements |
Memory intensive |
Data flow |
String concatenation |
Procedural |
Data flow |
Sorting data |
Set-based |
Source component |
Now that you have retrieved data from SQL Server the hard way, you’re going to learn the easy way to do the same thing. The Source Assistant is a new wizard introduced in SQL Server 2012 that takes you through the steps of setting up your Source objects without having to make many of the same decisions that you just had to go through. This is a great way for people who are just getting started with Integration Services to get up and running quickly.
To begin, create a new Data Flow task. As seen in Figure 4-10, the Source Assistant appears in the SSIS Toolbox. Initially it will be in the Favorites grouping, unless you have moved the items around.
Figure 4-10. Source Assistant in Favorites grouping in SSIS Toolbox
Drag the Source Assistant component onto the Data Flow design area to start the wizard. The first screen can be seen in Figure 4-11.
Figure 4-11. Add New Source screen in Source Assistant
To begin, you can see that there are only a few types listed for you to use: SQL Server, Excel, Flat File, and Oracle. If you want to see a list of components available if you install the providers, you can uncheck the Show Only Installed Source Types option. By only offering the one SQL Server choice, Integration Services is making your life easier by directing you to the correct provider immediately. When you select the Source Assistant’s SQL Server type, it will use the OLE DB connection manager, which is also your go-to connection manager!
Once you select the SQL Server type, you have the option of selecting an existing connection manager from the pane on the right or creating a new one. Creating a new connection manager will take you through the exact steps you looked at previously for setting up an OLE DB connection manager.
Finally, pick your new or existing connection manager and click the OK button. This will create the connection manager and add the SQL Server source to the Data Flow task. You can then immediately pick up your development by creating and optimizing the SQL query. The Source Assistant is a great way to get started with developing your Integration Services package, especially if you are new to Integration Services. If you know that you want to use one of the other types of connections, you can just create the connection manager and source directly, without having to use the Source Assistant. Either way, you have a few ways to start your development as quickly as possible.
Summary
At this point, we described why you would want to use certain SQL Server sources over others, how to set up the source, and how to clean up the source query to get the best performance out of your package. We also covered sources in general to set up the rest of the source chapters.
Although all of the principles described in this chapter are patterns for SQL Server, you can apply many of them to other source types as well. Be sure to review the rest of the source chapters for patterns that you can use for SQL Server in addition to what we have already discussed.
18.119.103.204