Chapter 3. SQL Server Analysis Services

Now that you have a fundamental understanding of OLAP and multidimensional analysis, let's start to dig into the reason you bought this book: to find out how these OLAP technologies are implemented in SQL Server, specifically SQL Server Analysis Services (SSAS). SSAS really came into its own in SQL Server 2005, which was a massive overhaul of the entire data platform from SQL Server 2000. SQL Server 2008 Analysis Services is more evolutionary than revolutionary, but still has significant improvements and additions from the 2005 edition.

I wrote this chapter from the perspective of SSAS in the 2008 version (formerly code-named Katmai). If you're familiar with the 2005 version of SQL Server Analysis Services (formerly code-named Yukon), you may just want to skip to the last section, where I call out the specific improvements in SQL Server 2008 Analysis Services.

Requirements

Before I dive into the "all about SQL Server Analysis Services" stuff, you may want to install it. For a detailed overview and instructions regarding installation of SQL Server 2008 and SSAS, see the SQL Server 2008 Books Online topic "Initial Installation" at http://msdn.microsoft.com/en-us/library/bb500469.aspx. I'll cover some of the high points here.

Hardware

I get a lot of questions about what kind of hardware to use for an Analysis Services installation. The answer is, "It depends." It depends on these factors:

  • How many users you plan to support (and how quickly)

  • What types of users you need to support (lightweight read-only reporting, or heavy analysis?)

  • How much data you have

  • How fast you expect the data to grow

Generally, the hardware decision boils down to one of three scenarios:

New business intelligence initiative:

Smallish amount of data, pilot group of users (fewer than ten).

Business intelligence initiative to satisfy large demand:

For example, the current user base is using Excel spreadsheets or an outdated software package against a fairly large existing body of data. So although there's no current solution, you anticipate that when a solution is introduced, it will see rapid adoption.

Replacing an existing solution:

In this case, there is generally a large body of existing data that sees heavy usage from a large number of users.

The first and third scenarios are the easiest to deal with. For the first scenario, you can start with a single server and either install all the software on the physical machine or set up a virtual environment reflecting a more mature architecture but on a single physical host (more on virtualization in a moment). In the third scenario, you'll have to do the hard-core analysis of the needs for data storage, data growth, and usage. In other words, you know the answers—you just have to translate them.

The second scenario is the scary one. Your options seem to be either spend a ton of money on a large-scale implementation, or run the possibility of setting up an architecture that your users outgrow very quickly. The best approach here is to plan a small pilot and measured growth to a full implementation, with provisions for scaling as necessary as usage and data storage needs grow.

Having said that, the minimum hardware requirements for SQL Server Analysis Services is a single-core, single-CPU 1GHz CPU with 512MB RAM. Obviously, this is fairly silly; it's almost impossible to buy a server that doesn't meet these specifications unless you're shopping on eBay. My personal recommendation for the hardware for a SQL Server Analysis Services implementation is as follows:

  • Two dual-core CPUs. Multiple cores are great for multithreading, but the I/O and cache architecture around discrete physical CPUs provide better scalability. An alternative, should cost be an issue, would be to start with a single dual-core CPU and plan to add a second when necessary. (Also be sure to verify that your server will accept quad-core CPUs, and keep an eye on the coming advances in eight-core CPUs and higher.)

  • 4GB RAM, with capability to grow to 64GB. SSAS is an extremely memory-hungry application.

  • For the disk system, I'm partial to two drives in RAID 1 (mirrored) for the system drive, and then a RAID 5 array for data. Some consider this fairly complex for monitoring and management, so a single RAID 5 or RAID 10 array can also serve. Analysis Services reads more than it writes, so read speed is far more important that write speed.

Virtualization

I mentioned virtualization earlier. Virtualization was made popular by VMware over the last ten years, and Microsoft now offers both Virtual Server for Windows Server 2003, and Hyper-V technologies on Windows Server 2008. I'm not sure that virtualization is a good idea with SSAS in the grand scheme of things. It's such a resource-intensive service that you'll generally lose more than you gain. The only time I would advocate it is if you're just starting out; you could set up a virtualized network on a single server, and then move virtual machines to physical machines as necessary (see Figure 3-1).

Scaling up from virtual to physical

Figure 3.1. Scaling up from virtual to physical

In Figure 3-1, the solution was originally set up as five virtual servers on a single physical box. As the solution grew, the first place we started seeing limitations were on the SSAS box (RAM) and the OLAP relational store (hard drive space and I/O speed). So in a planned migration, we back up each server and restore it to a new physical server to give us the necessary growth and headroom.

Note

The Microsoft support policy for virtualization can be found at www.microsoft.com/sqlserver/2008/en/us/virtualization.aspx. The basic support policy is that SQL Server 2008 is supported on Hyper-V guests. However, for other virtualization solutions (for example, VMware), Microsoft's support is best effort (if something can be resolved in the virtual environment, Microsoft will do its best to assist). However, if at any time it becomes possible that the problem is related to the virtualization environment, you'll be required to reproduce the problem directly on hardware.

Software

To answer the first question in the realm of the 2008 Servers: No, you can't install SQL Server 2008 on Windows Server 2008 Core. You can install it on Windows Server 2003 SP2 or later, or Windows Server 2008. SQL Server Standard Edition can also be run on Windows XP SP2 or Vista. SQL Server x86 (32-bit) can be run on either x86 or x64 platforms, while x64 (64-bit) can run only on x64 platforms.

Tip

Although SQL Server 2008 is supported on a domain controller, installing it on one is not recommended.

SQL Server setup requires Microsoft Windows Installer 4.5 or later (you can be sure that the latest installer is installed by running Windows Update). The SQL Server installer will install the software requirements if they're not present, including the .NET Framework 3.5 SP1, the SQL Server Native Client, and the setup support files. Internet Explorer 6 SP1 or later is required if you're going to install the Microsoft Management Console, Management Studio, Business Intelligence Development Studio, or HTML Help.

Note

Installation of Windows Installer and the .NET Framework each require rebooting the server, so plan accordingly.

Upgrading

Upgrading from SQL Server 2000 to 2005 was a fairly traumatic experience, because of the massive architecture changes in the engine, storage, and features. Although some features have been deprecated or removed in SQL Server 2008 as compared to 2005, the migration is far smoother.

The bottom line with respect to upgrading: If you have SQL Server 2005 installations that you have upgraded from SQL Server 7 or 2000, the migration to 2008 should be much easier. More important, if you have current SQL Server 2000 installations and you are evaluating migration to SQL Server 2005, you should move directly to SQL Server 2008.

Consider one more point when evaluating upgrading from SQL Server 2005 to 2008. A number of my customers have only recently finished upgrading to SQL Server 2005 and are understandably concerned about another migration effort so soon. There is no reason your server farm has to be homogeneous—for example, you could upgrade your Analysis Services server to 2008 while leaving the relational store at 2005. Evaluate each server role independently for upgrade, because each role offers different benefits to weigh against the costs.

Resources for upgrading to SQL Server 2008 can be found at http://msdn.microsoft.com/en-us/library/cc936623.aspx, including a link to the Upgrade Technical Reference Guide.

Standard or Enterprise Edition?

When you decide to go with SQL Server 2008 Analysis Services, a big decision to make is whether to go with Standard or Enterprise Edition. In general, Standard Edition is for smaller responsibilities, and Enterprise Edition is for larger, more mission-critical jobs. One easy way to differentiate is to ask yourself, "Can I afford for this server to go down?" If not, you probably want to look at Enterprise Edition.

Note

The full comparison chart for SQL Server's Standard and Enterprise Editions is at www.microsoft.com/sqlserver/2008/en/us/editions.aspx.

With SQL Server 2000, the primary differentiator was that you could cluster Enterprise Edition while you couldn't cluster Standard Edition. That alone was pretty much the deal-maker for most people. In SQL Server 2005, you could cluster Standard Edition to two nodes, which seemed to remove a lot of the value of Enterprise Edition (not quite true—there are still a lot of reasons to choose Enterprise Edition).

SQL Server 2008 adds a lot of features, and a majority of them are only in the Enterprise Edition. From an Analysis Services perspective, features that are available only in Enterprise Edition include the following:

Scalable shared databases:

In SQL Server 2005, you could detach a read-only database and park it on a shared cluster for use as a reporting database. In SQL Server 2008, you can do this with a cube after the cube is calculated. You detach the cube and move it to central storage for a farm of front-end database servers. Users can then access this farm by using tools such as Excel, ProClarity, or PerformancePoint for analysis and reporting.

Account intelligence:

This feature enables you to add financial information to a dimension that specifies account data and then sets the dimension properties to be appropriate to that account type. For example a "statistical" account type would have no aggregation, whereas an "asset" account type would be set to aggregate the last nonempty member (similar to an inventory calculation).

Linked measures and dimensions:

I've explained that instead of having one large cube, you often want to create several smaller cubes. However, you may have shared business logic or dimensions (who wants to create and maintain the corporate structure dimension over and over?). Instead, you can create a linked measure or linked dimension, which can be used in multiple cubes but maintained in one location.

Semiadditive measures:

As I mentioned in Chapter 2, you won't always want to aggregate measures across every dimension. For example, inventory levels shouldn't be added across a time dimension. Semiadditive measures provide the ability to have a measure aggregate values normally in several directions, but then perform a different action along the time dimension.

Perspectives:

When considering cubes for large organizations, the number of dimensions, measures, and members can get pretty significant. The AdventureWorks demo cube has 21 dimensions and 51 measures, and it's focused on sales. Browsing through dozens or hundreds of members can get old if you have to do it frequently. Perspectives offer a way of creating "views" on a cube so that users in specific roles get shorter, focused lists of dimensions, measures, and members suiting their role.

Writeback dimensions:

In addition to being able to write back to measures, it's possible to enable your users to edit dimensions from a client application (as opposed to working with the dimension in BIDS). Note that dimension writeback is possible only on star schemas.

Partitioned cubes:

Also mentioned in Chapter 2, the ability to partition cubes makes maintenance and scaling of large cubes much, much easier. When you can shear off the last 12 years of sales data into a cube that has to be recompiled on only rare occasions, you do a lot for the ability to rebuild the current cube more often.

Architecture

SQL Server Analysis Services runs as a single service (msmdsrv.exe) on the server. The service has several components, including storage management, a query engine, XMLA listener, and security processes. All communication with the service is via either TCP (port 2383) or HTTP.

The Unified Dimensional Model

A major underlying concept in Analysis Services is the unified dimensional model, or UDM. If you examine more-formal business intelligence, data modeling, or OLAP literature, you will often find something similar to Figure 3-2. Note the requirement for a staging database (for scrubbing the data), a data warehouse (for aggregating the normalized data), data marts (for severing the data into more-manageable chunks), and finally our OLAP store. I have seen architecture with even more data redundancy!

A traditional BI architecture

Figure 3.2. A traditional BI architecture

Apart from the duplication of data (requiring large amounts of disk space and processing power to move the data around), we also have the increased opportunity for mistakes to surface in each data translation. But the real problem we face is that systems like these often seem to end up like Figure 3-3. Various emergent and exigent circumstances will create pockets and pools of data, and cross connections, and it will all be a mess.

Does this look familiar?

Figure 3.3. Does this look familiar?

SSAS is designed to conceptually unify as much of Figure 3-2 as possible into a single-dimensional model, and as a result make an OLAP solution easier to create and maintain. Part of what makes this possible is the data source view (DSV), which is covered in Chapter 5. The DSV makes it possible to create a "virtual view," collating tables from numerous data sources. Using a DSV, a developer can create multiple cubes to address the various business scenarios necessary in a business intelligence solution. The net result is Figure 3-4—less data redundancy and a more organized architecture.

How Analysis Services enables the unified dimensional model

Figure 3.4. How Analysis Services enables the unified dimensional model

As I've mentioned previously, in many cases the data in the source systems isn't clean enough for direct consumption by a business intelligence (BI) solution. In that case, you will need a staging database, which is designed to be an intermediary between the SSAS data source view(s) and the source systems. This is similar to Figure 3-5, which also shows various clients consuming the cube data.

Using a staging database to clean data before the SSAS server

Figure 3.5. Using a staging database to clean data before the SSAS server

There is still a lot of potential for complexity. But I hope you see that by using one or more data source views to act as a virtual materialized view system, combined with the power of cubes (and perspectives, as you'll learn later), you can "clean up" a business intelligence architecture to make design and maintenance much easier in the long run.

Logical Architecture

Figure 3-6 shows the logical architecture of Analysis Services. A single server can run multiple instances of Analysis Services, just as it can run several instances of the SQL Server relational engine. (You connect to an Analysis Services instance by using the same syntax: [server name] [instance name].) Within each instance is a server object that acts as the container for the objects within.

Each server object can have multiple database objects. A database object consists of all the objects you see in an Analysis Services solution in BIDS (more on that later). The minimum set of objects you need in a database object is a dimension, a measure group, and a partition (forming a cube).

SQL Server Analysis Services logical architecture

Figure 3.6. SQL Server Analysis Services logical architecture

I've grouped the objects in a database into three rough groups:

OLAP objects:

Consisting of cubes, data sources, data source views, and dimensions, these are the fundamental objects that we use to build an OLAP solution. This is an interesting place to consider the object model as it relates to our OLAP world (Figure 3-7).

The database object model

Figure 3.7. The database object model

Note that the Dimension collection is not a member of the Cube class, but an independent collection under the Database class. This speaks to the way that dimensions are created and can be shared among different cubes in the same database. The Cube class then has a collection of CubeDimension objects, which are references to the corresponding Dimension objects.

The Cube class does own its MeasureGroup, which is a collection of Measure objects. The same applies for the Perspectives collection and CubePermissions collection.

Data-mining objects:

This is pretty much the MiningStructure collection and the subordinate object hierarchy. A mining structure contains one or more MiningModel objects, as well as the columns and bindings necessary to map a mining model to the data source view. Chapter 11 covers data mining in depth.

Helper objects:

Something of an "everything else" catchall. The helper objects consist of a collection of Assembly objects, DatabasePermission objects, and Role objects for managing security. An Assembly object represents a .NET assembly installed in the database.

You may ask, "What do these object models do for me?" In SQL Server Analysis Services, you can have stored procedures to provide functions that implement business rules or requirements more complex than perhaps SSAS can easily accomplish. Perhaps you need to run a query that calls to a web service and then retrieves a data set from a relational database based on the results of that query. You could create a stored procedure that accepts parameters and returns a data set, and then call that procedure from MDX in a KPI-bound or a calculated measure.

Physical Architecture

As I've mentioned previously, SQL Server Analysis Services runs as a single Windows service. The service executable is msmdsrv.exe, the display name (instance name) is SQL Server Analysis Services, and the service name is MSSQLServerOLAPService. The default path to the executable is as follows:

$Program FilesMicrosoft SQL ServerMSAS10.MSSQLSERVEROLAPin

That service has an XMLA listener that handles all communications between the SSAS service and external applications. The XMLA listener defaults to port 2383, and can be changed either during setup or from SQL Server Management Studio (SSMS). The location of database data files can also be changed in SSMS; Chapter 4 covers that in more detail.

If you've ever had to root around the SQL Server file system, there's some great news with SQL Server 2008. With previous versions of SQL Server, folders for additional services (Analysis Services, Reporting Services, Integration Services) were simply added to the Microsoft SQL Server folder with incrementing suffixes (see Figure 3-8). You would have to open each folder to find the one you were looking for.

Folder hierarchy in SQL Server 2005

Figure 3.8. Folder hierarchy in SQL Server 2005

In SQL Server 2008, the folder-naming conventions are far more intuitive (see Figure 3-9). You will have folders for MSSQL10, MSAS10, and MSRS10. In addition, you can see that the service has the instance name in the folder, such as MSAS10.MSSQLSERVER (MSSQLSERVER being the tag for the default instance).

Folder naming in SQL Server 2008

Figure 3.9. Folder naming in SQL Server 2008

The startup parameters for the SSAS service are stored here:

MSAS10.<instance>OLAPConfigmsmdsrv.ini

This is an XML file. Most notable here are the DataDir, LogDir, and AllowedBrowsingFolders tags. In case of gremlins, it's good to verify that these entries are what you think they are. You should also verify which INI file the service is loading by checking the properties for the SQL Server Analysis Services service. You'll see Path to Executable, as shown in Figure 3-10.

SQL Server Analysis Services Windows Service properties

Figure 3.10. SQL Server Analysis Services Windows Service properties

You'll probably have to highlight and scroll to see the whole path. You should have something like "C:[path]msmdsrv.exe" -s "C:[path]Config", where the Config file is the Config.ini file location. If you need to change this file location, you can use msmdsrv.exe on the command line to unregister the service, and then re-register it with the new INI file location. (Use msmdsrv /? to see the command-line options.)

Warning

Do not change the INI file location unless you absolutely need to to address a problem. You could easily put the SSAS service in an unusable state.

So now let's take a look at where Analysis Services stores all its data.

Storage

When considering storage of SSAS solutions, you have the actual data, the aggregation values, and the metadata of the solution. Each of these are handled separately by Analysis Services. How they're handled depends on the storage mode you choose—ROLAP, MOLAP, or HOLAP.

The default storage option in SSAS is MOLAP. The M is for multidimensional. In MOLAP storage, Analysis Services keeps everything in its data stores: the metadata defining the cube solution, a copy of the data, and the precalculated aggregations from the data.

In ROLAP (relational), the metadata defining the object is stored in the SSAS data store, but the data source isn't cached. The live data from the relational source is used, and the aggregations are calculated on-the-fly.

HOLAP is a mix of the two (H is for hybrid). The aggregations are stored in multidimensional format, but the source data is retained in the original data store. SSAS offers additional options in which the measure group data is stored in SSAS storage, but the source data is monitored for changes, and the cube is reprocessed dynamically based on the amount of data changed.

With the exception of ROLAP and the data for HOLAP, SQL Server Analysis Services stores its data in the file system. The administrative and developer access to all SSAS structures and data is through the SQL Server Management Studio and Business Intelligence Development Studio. As we've discussed, all these interfaces operate by using XMLA via the SSAS service. Although you may be used to SQL Server storing databases in a single data file (or a small number of files if you're using file groups), SSAS starts its optimization by storing its data in a structured system within Windows file folders.

The root for SSAS storage is going to be the location indicated in the StorageLocation setting for the structure selected. The default value is set at the server level in the DataDir property (Figure 3-11). You can access the property dialog box by right-clicking on the server in SQL Server Management Studio and selecting Properties.

Setting the default data directory in SSAS server properties

Figure 3.11. Setting the default data directory in SSAS server properties

The cube and measure group metadata locations can be set in the StorageLocation properties for each. This will open a dialog box that lists the folders available for locating files (Figure 3-12).

Selecting the storage location for SSAS objects

Figure 3.12. Selecting the storage location for SSAS objects

You can set the folders that are listed by changing AllowedBrowsingFolders in the advanced properties for the SSAS server in SQL Server Management Studio. (There is a check box for Show Advanced Properties near the bottom of the properties dialog.)

Under the DataDir, SSAS will create a folder for the database, or catalog (the logical structure equivalent to the SSAS project you'll discover in Chapter 4.) This folder will have the name of the catalog ending in an index (a version number) and a .db extension. Under this folder will be XML files representing each object in the solution. These are effectively header files, containing the properties and description of each object. In this folder, you'll see files ending in .cub.xml for cubes, .ds.xml for data sources, .dms.xml for data-mining structure, and so on.

Of more interest are the subfolders in our main catalog folder (Figure 3-13). There's a subfolder for each object in the catalog. We're primarily interested in two types of folders: cubes (.cub) and dimensions (.dim).

File storage for SSAS catalogs

Figure 3.13. File storage for SSAS catalogs

There will be a .cub folder for every cube in the catalog. Inside the .cub folder you'll find folders for each measure group ending in .det. You'll also find XML files—one for each measure group again (*.det.xml), one for each perspective on the cube (*.persp.xml), and an info file. The info.[version].xml files are effectively the header files for the given folder they're in. Each .det folder will have a subfolder and XML header file for each partition in the measure group.

Now before we dive into the partition folders, let's take a look at how the partitions are defined for the AdventureWorks cube we'll be working with in this book (see Table 3-1). Note the estimated number of rows in each partition—just over a thousand in 2001, and almost three thousand in 2002. But in 2003 and 2004, we have 32,265 rows. (These are the same because the aggregation design set an upper limit.) So how does this affect our storage?

Table 3.1. Internet Sales Partitions

Partition Name

Size

Rows

Storage Mode

Internet_Sales_2001

28.9KB

1,013

MOLAP

Internet_Sales_2002

77.1KB

2,677

MOLAP

Internet_Sales_2003

1.4MB

32,265

MOLAP

Internet_Sales_2004

1.5MB

32,265

MOLAP

Let's check the prt (partition) folder for these partitions (Figures 3-14 and 3-15).

Folder contents for the 2002 partition

Figure 3.14. Folder contents for the 2002 partition

Folder contents for the 2003 partition

Figure 3.15. Folder contents for the 2003 partition

What happened? Remember the difference in row counts? For smaller data sets, SSAS may have only three files: the ever-present info.[version].xml, a data file, and a header (hdr) file. The header file is the index indicating where the data is stored in the .data file. However, when we get into larger data sets, scanning files looking for data becomes very inefficient. So Analysis Services uses bitmap indexes for each dimension. Instead of scanning through gigabytes of data files, SSAS parses the dimensions and members needed from the query, and then reads the dimension map file for each dimension, determining the data page number for each member.

SSAS then joins the lists of page numbers, and has to scan only the data pages that resolve from all dimensions queried. For example, if the queried members of the products dimension needs values from pages 12, 34, and 46, but the queried members from the geography dimension need values only from 34, 46, and 57, then the engine needs to retrieve only the values from pages 46 and 57, where the sets intersect.

Dimensions work in a similar fashion. Each .dim folder will have the ubiquitous info.[version].xml file as well as a number of files for each level of each attribute or hierarchy of the given dimension. The files (which are encoded binary) all have extensions ending in store for storage. The extensions and their meanings are in Table 3-2.

Table 3.2. Dimension Storage File Extensions

Extension

Meaning

.kstore

Key store

.ksstore

Key string store

.khstore

Key hash store

.astore

Property store

.asstore

Property string store

.ahstore

Name hash table

.hstore

Hole store

.sstore

Set store

.lstore

Structure store

.ostore

Order store

.dstore

Decoding store

The most interesting thing to understand is that Analysis Services consists of a windows service and all these files for indexing and storage. To run SQL Server Analysis Services, you do not need a SQL Server RDBMS instance running on the server. (You'll still need one somewhere if you plan to keep your source or staging data there.)

Now that you have a solid understanding of what SSAS looks like from the underside, let's look at an OLAP solution from the perspective we'll usually see it.

Cube Structures in SSAS

When you're creating an Analysis Services cube, you'll pretty much always be working through the Business Intelligence Development Studio, or BIDS. I'll be using BIDS here for a basic walk-through of the developer's eye view of an SSAS solution, but I'll cover BIDS in depth in Chapter 4.

Figure 3-16 shows the AdventureWorks cube open in BIDS. The central area shows the data source view (more on that in a moment.) To the left are panes that show measure groups and measures, as well as dimensions and hierarchies. On the right is the Solution Explorer, which details all the objects in our solution, including data sources, data source views, cubes, dimensions, data-mining structures, security roles, and assemblies.

An Analysis Services cube in the Business Intelligence Development Studio

Figure 3.16. An Analysis Services cube in the Business Intelligence Development Studio

Figure 3-17 shows the Solution Explorer rolled up.

The items in the Solution Explorer

Figure 3.17. The items in the Solution Explorer

What I really like about the Solution Explorer is that it's like a checklist for creating a cube; you create data sources, and then a data source view by using the tables from the data sources. You can then create a cube directly and use the cube wizard to generate dimensions. Finally, you can create mining structures on your cube, assign security roles, and add assemblies for advanced capabilities in the cube.

Data Sources

Data sources are the reason we're going through this exercise in the first place! You create data sources to connect to the places you'll be pulling data from. You can essentially connect to anything you can create an OLE DB connection to. The connections are used when the data is read. For MOLAP, that's generally only when the cube is processed; for HOLAP or ROLAP, it may be any time a user performs analysis on the cube.

After you have one or more data sources, you need a way to weave the data together. Similar to a relational database where you can combine multiple flat tables into a view, in Analysis Services we have the data source view.

Data Source View

The easiest way to think of a data source view is to picture the database diagram tool in SQL Server Management Studio. A data source view (DSV) lets you add multiple tables and views from data sources to a canvas (Figure 3-18). You can either maintain existing relationships from their original data source, or you can create relationships in the DSV.

A data source view (DSV)

Figure 3.18. A data source view (DSV)

You can even create new views (known as named queries here) directly in the DSV from existing tables in data sources. Creating a new named query will give you the old faithful query designer and allow you to add existing tables and join them, and then select the fields for your DSV.

The nice thing about the DSV is that you don't have to aggregate all your data into a staging database just to build a cube. If your data is in a state to do so, you can effectively create a "virtual" staging database in the DSV. All the data will be read in accordance with the schema in the DSV, but you didn't have to create a second copy just to stage the data. You may also hear this capability referred to as the unified dimensional model, or UDM.

The Cube Structure Itself

After we have a data schema, we can build one or more cubes. BIDS has a very nice cube design wizard that can walk you through creating a workable cube. The wizard will evaluate the data in your DSV, recommend measures and dimensions, and create the supporting structures. As you gain more experience, of course you'll want to fine-tune things better.

Just a reminder that while the cube "owns" the measures and measure groups, it does not own the dimensions. Dimensions are created as equivalent objects and then associated with the cube. After a dimension is associated with a cube, you will need to indicate the relationship between the dimension and each of the measures in the cube (see Figure 3-19 for how this works).

Associating dimensions with measures

Figure 3.19. Associating dimensions with measures

When you associate a dimension with a measure, you have a choice of ways to relate the two:

No relationship:

There is no relationship between the dimension and the measure. Attempting to slice the measure with the dimension will have no effect (all the values will be the same).

Regular:

The most common, this is a standard relational relationship between two tables.

Fact:

This means that the measure and the dimension are based on the same table. For example, in a purchase order scenario, the table with line items may have both cost (measure) and product (dimension) information.

Referenced:

In this case, there is an intermediate table between the measure table and the dimension table. For example, if we want to break down sales by geography, we will have to relate the Orders table to the Geography table via the Customer table (orders.customerID ® customer.customerID; customer.StateID ® states.StateID).

Many-to-many:

Beware, this kind of arrangement can cause problems with local cubes! In a many-to-many join, the fact table is joined to an intermediate dimension table, which is joined to an intermediate fact table, which is joined to the final dimension table.

Data mining:

This connection leverages a mining model to make the connection. I'll cover the implications of this in Chapter 11.

Cubes can also have calculated measures, key performance indicators (KPIs), and actions. A cube can have multiple perspectives, as I've mentioned, to simplify the user interface for an especially complex cube. You can have translations to offer multilingual cube solutions. These are all features of cubes and found in the cube designer in BIDS.

Dimensions

Although the cube wizard may automatically create some dimensions, dimensions are generally designed separately from cubes in SSAS. Figure 3-20 shows a dimension in the dimension designer. You can see the data source view for the dimension on the right (dimensions are data-driven and so will need a reference to the data sources). The dimension itself, with attributes, is on the left. In the middle are the hierarchies for the dimension.

A dimension with several hierarchies in BIDS

Figure 3.20. A dimension with several hierarchies in BIDS

Mining Structures

Mining structures contain mining models—data structures that implement one of the many data-mining algorithms in SSAS to identify patterns in data or predict values in new data. Chapter 11 covers mining structures.

What's New in SQL Server 2008

Most of this chapter has been fairly agnostic between SQL Server 2005 and 2008. The basics and the architecture didn't change that much. So let's talk about some of the things that have changed.

Note

Most of the changes in SQL Server 2008 are incremental and additive, so think in terms of "adding a few features" instead of "massive migration pain."

Performance

Performance was one of the two areas the SSAS team focused heavily on (developer and administrator experience being the other). The goal was both to improve and optimize the engine, but also to provide more tools for cube developers and DBAs to get the maximum use out of their hardware. Following are just some of the performance enhancements to SQL Server 2008 Analysis Services.

Management Data Warehouse

The Management Data Warehouse (MDW) is a table inside SQL Server that provides for the collection of performance-related statistics. You can extend the statistics collection to include any metric. After you have the data in a table, you can build reports on it, or even build a cube for analysis.

To set up the MDW, you need access to an instance of SQL Server 2008. Under the Management folder, find the Data Collection node. Right-click on it and select Configure Management Data Warehouse (see Figure 3-21).

Configuring the MDW

Figure 3.21. Configuring the MDW

This starts a wizard through which you can either create an MDW, or start a collection to an existing MDW (so you can centralize statistics from several servers). After you create a database and assign user permissions, you'll need to run the configuration wizard again, but select the Configure Data Collection option.

Note

The SQL Server Agent must be running on the SQL Server that is hosting the MDW, or the data collection configuration will fail.

Now you'll have data collectors running on disk usage, query statistics, and server activity. You could prompt a collection and upload manually, you could script it, or you could use SSIS or a job to run the collection.

After some data has been collected, you can right-click on the Data Collection node, choose Reports ® Management Data Warehouse, and then select a report. The Server Activity History report is shown in Figure 3-22. I've found the best source of information about Performance Studio is the hands-on lab at http://go.microsoft.com/?linkid=8316556.

A data collection report showing database sizes

Figure 3.22. A data collection report showing database sizes

Reliable Query Cancel

In SSAS 2005, trying to get a query to stop often took minutes (and ended up with the admin just stopping the service and risking data corruption). In SQL Server 2008, canceling a query or connection should stop the query immediately.

Dynamic Management Views

Dynamic management views (DMVs) have been around for a while, but SSAS 2008 is the first time we've been able to use them in Analysis Services. Simply put, these are dynamic views that return management data. For example, open an MDX browser on an Analysis Services cube and query SELECT * FROM $SYSTEM.MDSCHEMA_DIMENSIONS. You should get a result of all the dimensions in the current server. This is simply easier than having to write a query to extract data from the system tables. You can find a list of DMVs in the Books Online.

Tip

If you use DMVs a lot, you should check out the Analysis Services Stored Procedure Project on CodePlex at www.codeplex.com/ASStoredProcedures.

Block Computing—Optimization

Once again, let's consider our purchase order scenario. We have thousands of products and customers, and 365 days a year, and we want to calculate the sales tax for all purchases in 2008. Do we go cell by cell to calculate each value and then add them all together? Think in terms of customers shopping. Did most of our customers come by the store every day and buy most of our products? No—on average, say somewhere between 1 percent and 10 percent of our customers shop in the store, and they each buy a few products.

Most of our cube is empty.

Do we really want to go cell by cell? "On January 1, Mr. Smith did not buy any products—zero. On January 1, Mrs. Green did not buy any products—zero..." No, it makes more sense to calculate only the non-default values (we need to calculate the default value only once, not every time). SSAS 2008 now performs aggregations this way, optimizing the queries so that there is minimal grind through a sparse matrix of empty cells.

Writeback Performance Improvements

In SSAS 2005, writing back to a cube required updating the ROLAP (writeback) partition and then querying the MOLAP and ROLAP partitions together to effectively rebuild the picture. In SSAS 2008, the change is made to the ROLAP and MOLAP partitions, and the query is made to the MOLAP partition only afterward.

Change Data Capture

Change Data Capture (CDC) is a new feature in SQL Server 2008 that provides a method of tracking changed data in relational tables. The benefit is that you can have a job running that slowly updates any cubes without having to reprocess the whole thing.

Tip

Don't confuse Change Data Capture with Change Tracking. The latter captures the row that's changed, but not the data that is explicitly changed.

You will have to enable CDC on the server, database, and tables you want to track by writing stored procedures. See Books Online for details on enabling CDC and also how to leverage Change Data Tracking in an Integration Services package.

Tools

SQL Server Management Studio (SSMS) and the Business Intelligence Development Studio (BIDS) truly benefited from being in the field since 2005. The SQL Server team took all lessons learned to heart and used them to improve the tools. Again, because this version is incremental instead of a full overhaul, we get the benefit of the improvements and added features without the learning curve of adopting another new user interface.

Here are some of the significant improvements in BIDS for Analysis Services.

Dimension Design

Designing dimensions was a little tricky in BIDS. It was one of those "once you get it, you get it, but getting there isn't easy" things. The wizard has been cleaned up in 2008, and dimension configuration is much, much easier. Compare the first page of the new dimension wizards from BIDS 2005 (Figure 3-23) and BIDS 2008 (Figure 3-24).

The Dimension Wizard in BIDS 2005

Figure 3.23. The Dimension Wizard in BIDS 2005

The Dimension Wizard in BIDS 2008

Figure 3.24. The Dimension Wizard in BIDS 2008

You can see that from the first page, the wizard is cleaner and easier to follow. In addition to the wizard, there is a full-featured attribute relationship designer in the Dimension Editor (Figure 3-25). Creating attribute hierarchies used to be a serious pain, but with this designer, it's much more straightforward.

Attribute hierarchy designer in the Dimension Editor

Figure 3.25. Attribute hierarchy designer in the Dimension Editor

Chapter 6 covers dimension design.

Aggregation / UBO Designer

In BIDS 2005, we didn't have a lot of control over the storage of measures and measure groups. In 2008, there is a dedicated designer for aggregations (Figure 3-26), which also provides a wizard for usage-based optimization, or UBO (Figure 3-27).

Aggregation Designer in BIDS 2008

Figure 3.26. Aggregation Designer in BIDS 2008

The Usage-Based Optimization Wizard

Figure 3.27. The Usage-Based Optimization Wizard

The Aggregation Designer lets you easily combine partitions into single measures, as well as controlling the storage for the measures. The UBO Wizard helps you adjust storage requirements based on past query history to optimize your cubes based on your users' usage patterns. Of course, because the SQL Team also invested significantly in improving initial aggregation designs, maybe you won't need these optimizations as much as you used to.

AMO Design Warnings

With SSAS 2005, I was often either stuck with a cube that didn't work as expected, or suspicious that I was doing something wrong. So I'd have to research the subject area I was having problems with and try to understand the whole field so I could figure out what I was doing wrong.

In BIDS 2008, there more than 40 "best practice" design warnings that let you know when you're doing it wrong. You can access the warnings by choosing Database

AMO Design Warnings

The warnings show up as a blue squiggle—pretty recognizable to anyone who's spent any time in Visual Studio (or Office!). See Figure 3-28 for an example.

An AMO "best practice" warning

Figure 3.28. An AMO "best practice" warning

Summary

That's a quick tour of some of the many improvements in SQL Server Analysis Services 2008. I'll review what's new in data mining in Chapter 11. Of course, a cube designer can also reap the benefits of the improvements in Integration Services, Reporting Services, and the relational engine, so don't neglect those!

In the next chapter, you will discover the SSAS developer and administration interfaces. I will introduce you to the Business Intelligence Development Studio, the SQL Server Management Studio, and PowerShell.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.88.110