Chapter 5. Creating a Data Source View

Step 1: Get data. That's what we have to do if we're going to build a cube.

To get data into an Analysis Services cube, we need to build a data source view (DSV). The data source view is how we represent complex relational data models for our dimension and cube design. Before we create a DSV, we'll need to create the data sources we will use to populate it. And before we do that, we'll need to create a project in BIDS.

We'll do everything I've just mentioned in this chapter, after we cover some introductory material about the data for our cube.

Cubes Need Data

A data source view (Figure 5-1) is the layer of abstraction between a cube and the underlying data source. Although it looks like a simple ERD from a database, the important thing to note is that we're able to map tables from different data sources together into a single unified schema. Traditionally, an OLAP solution would require an OLAP-specific data store, providing the views and data structures necessary to build the cube and dimension structure. SSAS does this virtually—by building the data structure in a data source view, we can skip the step of building it physically.

A data source view

Figure 5.1. A data source view

In addition, by having multiple data source views, we can keep the alignment between cubes and data sources clearer. As I pointed out in Chapter 3, SSAS uses the concept of a universal data model to get away from the need for multiple data marts; using one or more data source views is part of that architecture.

You should notice something familiar about Figure 5-1: it looks like a diagram for a relational database. That's effectively what it is. We use the data source view to create a virtual schema representing relational data we can use to build our cube from. But again, because we're doing this in the Analysis Services server, we can map to tables from various servers. If we have the keys necessary to link the tables in our source systems, we can build a cube directly from those data sources—no staging database, no data warehouse, no data marts!

Mind you, in all likelihood you'll still need a staging database. The data needs to be cleaned and normalized (the key in one database is very unlikely to match to the corresponding key in another). So when you're aggregating the data, set up a staging database to act as the data source for our data source view.

Data Sources

Before we can start building a data source view, we need data sources. An SSAS database can have numerous data connections (Figure 5-2), so we can have multiple data source views, and individual views can draw from multiple databases.

Data sources in the Solution Explorer in BIDS

Figure 5.2. Data sources in the Solution Explorer in BIDS

Data sources are pretty much as you'd expect; they capture the connection string and authentication info for a server. SSAS data sources are limited to .NET and OLE DB providers—no ODBC. When you install the provider, you'll find that the wizard just picks it up directly and it's available in the selector in the wizard (Figure 5-3).

Selecting a provider for a new data source

Figure 5.3. Selecting a provider for a new data source

You can see in Figure 5-3 that in addition to the SQL .NET providers, there's also an Oracle .NET provider. Under OLE DB we have providers for Jet (Access), SSAS, Data Mining, DataShape (for hierarchical record sets), Oracle, MySQL, and SQL Server. Selecting a provider will load the appropriate UI for the connection information.

Creating a data source is pretty straightforward—the designer is just two panels. The first page (Figure 5-4) enables you to build the connection string with the standard Windows connection manager. The Data Source References section in the center of the page can maintain a reference to another data connection in the same project or even another SSAS project. So, for example, if you want to have two data connections with the same connection information but using different impersonation properties (to use different user accounts on a database), you could link them this way and just make changes in one location.

The General tab of the Data Source Designer

Figure 5.4. The General tab of the Data Source Designer

Remember that the next set of options are about an OLE DB/.NET connection, not OLAP. This is a simple data connection to a database. The Isolation level lets you enable snapshot isolation on the SQL connection, reducing row locking and contention. You can also set the timeout on the query (by default, 0 seconds means the query won't time out). Finally, you can set the maximum number of connections and a logical description.

The second tab of the designer is dedicated to the identity the connection will use when connecting. Let's look at the four options here:

Use a specific Windows username and password:

SSAS will use the provided credentials for connecting to the data source.

Use the service account:

The default selection, this connection will use the account information set for the Analysis Services service account. To change that account, choose Start → Administrative Tools → Services. Alternatively, press the Windows key with the R key to open the Run dialog box, and then type services.msc and press Enter. Select the SQL Server Analysis Services service, right-click and select Properties, and then select the Log On tab.

Use the credentials of the current user:

Analysis Services will use the current user's credentials for Data Mining Extensions (DMX) queries, local cubes, and mining models. However, this option isn't valid for any action that interacts with actions that may have to cross machine boundaries—processing, ROLAP queries, linked objects, remote partitions, and synchronization from target to source.

Inherit:

This option will pass through the credentials of the user accessing the cube or data-mining model. (See the following sidebar, "That Old Double-Hop Problem.")

Our first exercise will be a short one—let's set up a data source for the AdventureWorks DW database (see Appendix A to set up AdventureWorks). Follow the steps in Exercise 5-1.

Data Source Views

Now that we have a data source, we'll need a data source view on which to build our dimensions and cubes. As I've mentioned before, you can build a data source view from multiple data sources. Simply create the additional data sources in the same manner.

The data source view defines the relational schema on which you build your cubes (consisting of measures and dimensions). As an abstraction point between an underlying database schema and the cube, the DSV allows the joining of tables that do not have relationships defined in the database, creation of primary keys on tables, calculated fields, and named queries to act as a view on data.

For the most part, the DSV is about building this relational model. If you've used a relational modeling tool in the past, it should be pretty familiar. We'll take a quick tour and then create a data source view for our cube.

Designer Tour

As I mentioned, the DSV designer is similar to any relational model designer—add tables and link them. If you add tables that already have relationships in the underlying data source, those relationships will be shown in the DSV. You can also add relationships in the DSV, but this is to inform links; it won't create any actual constraints on the tables.

The DSV designer is shown in Figure 5-8. The central pane is the actual design surface, as you can see from the tables and relationships shown there. To the top left is the Diagram Organizer. When you get large, complex relational structures in your data source view, you can create additional diagrams to focus on subsets of the tables. The AdventureWorks SSAS solution has several functional subviews.

The data source view (DSV) designer

Figure 5.8. The data source view (DSV) designer

To the lower left of the designer is a table browser. You can use this for reference or drag tables from the browser to the design surface. You can also check the properties of fields by right-clicking on a field and selecting Properties. BIDS also enables viewing the data in the table: right-click on the table and select Browse Data.

Finding Tables

If you do have a large, complex diagram and frequently need to move around looking for various tables, there are a few easy navigation methods. First and easiest—if you click on a table name in the tables pane, the diagram will immediately show that table if it's in the view. You can also click the Find Table button

Finding Tables

The other (really cool) way to navigate a large diagram when you're zoomed in is via the arrow in the lower-right corner of the DSV designer. When you click on the arrow, you'll see an overview of the data source view with an outline of the current view you can move to change the view (Figure 5-9).

Using the DSV table navigator

Figure 5.9. Using the DSV table navigator

Replacing a Table

You may need to replace a table in a data source view—perhaps with a view that has other information, or a table from another data source. If you right-click on a table, there's an option to replace the table with another table or a named query.

Warning

If you replace the table with another table that is significantly different, you stand a good chance of corrupting any dimensions or cubes that are based on that table.

Now let's take a look at creating a data source view, in Exercise 5-2.

Named Calculations and Queries

Occasionally you may need a column that can be derived from existing values, or a single table composed of columns from other tables. You can accomplish these in the data source view with named calculations and queries.

Named Calculations

For example, invoice line item data often doesn't include the subtotal; business systems calculate it on the fly. We could calculate it in the cube, as you'll see in Chapter 7, but for something that's relatively fixed (such as a subtotal value), why not calculate it at the data source and not waste processing power recalculating it over and over?

Analysis Services lets you calculate a value in the data source view by creating a named calculation. We effectively define a dynamic field in the table in the data source view that does the math when the data is read. Using this, we can create concatenated fields (full name consisting of first name plus last name), adjusted values (age from date of birth, aligning date/time values to join tables), or various calculated values.

Named calculations are defined in the data source view. You can create a named calculation by right-clicking on a table in the designer and selecting New Named Calculation. This will open the Named Calculation dialog box (Figure 5-13). It's pretty straightforward—name and description, and the expression for the calculation.

Note

The calculation must be a standard SQL expression that returns a scalar value.

Creating a named calculation

Figure 5.13. Creating a named calculation

In our data source view, we have a table for Employees. The table has a start date and end date, but perhaps we want to know the number of years the employees have been on the job. For employees still in the position, the end date is null, so we'll have to use today's date. If we're looking for the number of years, we can create a named calculation, which will recalculate whenever we process the dimension. So let's look at how we create the named calculation in Exercise 5-3.

Named Queries

The counterpart of a named calculation is a named query. This is effectively a SQL view as part of the data source view. The named query designer (Figure 5-15) will look very familiar to anyone who has designed a query in Access, SQL Server Management Studio, Reporting Services, and so on.

The named query designer

Figure 5.15. The named query designer

One of the top uses of a named query is to make data more readable. As shown in Figure 5-15, we want to show a label in the table instead of a numeric foreign key. Another use would be to merge tables for use in a star schema. Although a normalized database would have separate tables for county, state, and country, in our star schema we want a single table listing counties and indicating the state and country value for each. (You'll learn how to break this back down in Chapter 7.) After you create a named query, it will appear to subsequent wizards just as another table.

Let's add the Reseller table to our data source view, and include the geographic information about the reseller. Follow along in Exercise 5-4.

Summary

Now that you understand data source views, you can see how they can be used to aggregate data from multiple data sources. Also, by using multiple data source views, you can provide different ways of looking at data without actually creating different databases and duplicating the data itself. The data source view is a key concept in the universal data model, by providing a way of severing data without duplicating it. The "model" part is formed by the dimensions and measures of the cube itself. Let's move on to understanding dimensions by creating some.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.25.217