25. Mashing Up Data with PowerPivot

PowerPivot is a free add-in for Excel 2010 brought to you by the SQL Server Analysis Services team at Microsoft. One of the themes for the 2010 release of Office was to improve Excel as a Business Intelligence tool. PowerPivot makes it possible to do jaw-dropping analyses in Excel.

Benefits and Drawbacks to PowerPivot

There are some pluses to PowerPivot but also a few minuses. But first, let’s start with the mega-pluses, the things that will make you love PowerPivot.

Mega-Benefits of PowerPivot

Following are five mega-benefits to using PowerPivot. Any one of these benefits are enough to make me upgrade to Excel 2010.

Process far more than a million rows of data—I’ve seen demos with 100 million rows. If you have data sets that extend beyond row 1048576, you can now sort, filter, scroll, and pivot those data sets in PowerPivot.

Create pivot tables from multiple tables, without writing a VLOOKUP—You no longer have to write processor-intensive VLOOKUP formulas to join data from two worksheets before creating a pivot table. PowerPivot takes your various Excel tables and mashes them together without you having to code VLOOKUPs.

Mash-up data from disparate sources—The PowerPivot window can import text, Access, RSS, SQL Server, and Excel data and present it in a single pivot table.

Get access to Sets—Microsoft added a cool Set feature to Excel 2010 pivot tables that allows asymmetric reporting. One problem: It works only with OLAP pivot tables, not regular Excel data. The good news: Take your regular data through PowerPivot and you’ve just created an OLAP pivot table. I am sure that the heads of the Excel team will spin when I say it, but just having access to Sets is enough to make me run every future pivot table through PowerPivot!

Do calculations that make Excel’s Calculated Fields look like they were designed by someone in kindergarten—Microsoft introduced a new formula language in PowerPivot called DAX. DAX stands for Data Analysis Expressions. DAX is composed of 117 functions that let you to do two types of calculations. There are 81 typical Excel functions that you can use to add a calculated column to a table in the PowerPoint window. Then you can use 54 functions to create a new measure in the pivot table. These 54 functions add incredible power to pivot tables. Some examples:

• COUNTROWS(DISTINCT()) finally lets you count the number of distinct rows.

• CALCULATE(Expression,Filter1, Filter2,...FilterN) is like SUMIFS but for any expression. (Think MAXIFS and more.)

• There are 34 time intelligence functions that let you compare TOTALYTD sales versus a PARALLELPERIOD.

Moderate Benefits of PowerPivot

The following benefits are nice, but not jaw-dropping amazing:

Compression—Excel workbooks with PowerPivot data are smaller than workbooks that use traditional PivotCache pivot tables. The data is still stored inside the .xlsx workbook file, but the PowerPivot team came up with better ways to compress the data.

Join two pivot tables with a single set of slicers—You can have one set of slicers that control two separate PowerPivot tables.

Slicer Autolayout—Slicers created in regular Excel are always one column and always start at the same size. Slicers created in PowerPivot attempt to use some IntelliSense to be sized appropriately. It is not foolproof, but at least you can see that the PowerPivot team is trying to be thoughtful about your slicers.

PivotCharts without PivotTables—Well, not really. But it looks like it. PowerPivot can automatically build a chart on your presentation worksheet and then tuck the linked pivot table away on another worksheet.

Why Is This Free?

Up until Excel 2010, what was the greatest innovation in spreadsheets? Your answer will depend on the type of work that you have to do, but some possible answers might be the following:

Pivot Tables—When Pito Salas brought the pivot table concept to Lotus 1-2-3, it meant that you never had to do @DSUM and /Data Table 2 anymore.

VLOOKUP—Join data from two tables. It is what allows people to do things in Excel that should be done in Access.

IF / SUMIFS / AGGREGATE—These functions allow various conditional calculations.

1048576 Rows in Excel 2007—For me, this feature meant that I would never have to open Microsoft Access again.

A new add-in for Excel 2010 is as good as all four of those innovations wrapped into one. You can now do analysis of massive data sets, even data 100 times larger than Excel 2007. You can join tables without writing a VLOOKUP. You have aggregation and time series functions that have been lacking in Excel. You have the capability to do all this in a pivot table. It is not an exaggeration to say that PowerPivot is the best spreadsheet improvement to come out of Microsoft since pivot tables debuted in 1993.

Why then is this free?

Because 500 million people use Excel. That is a massive market of people. I should know—I make a living selling books to that market. If you can somehow sell something to one hundredth of one percent of that market, you have a bestseller on your hands.

Note

image

In this book, I do not cover the server version of PowerPivot, but I mention some benefits of the server version.

The SQL Server Analysis Services team give the client side version of PowerPivot to 500 million people because they figure that some small tiny percentage of those people will upgrade to the server version of PowerPivot. To get a server version, you buy SharePoint and SQL Server and other expensive technologies. By empowering Excel pros with these amazing tools, they figure that they might double their existing customer base of SQL Server customers.

Benefits of the Server Version of PowerPivot

If you get your IT folks to install PowerPivot Server, you get these additional benefits:

Automatic Refresh—In the client side, you have to open PowerPivot every day and click Refresh to have PowerPivot read the updated data sources. With the server version, this can automatically happen overnight.

Publish to Report Gallery—With the server version, you can publish your PowerPivot pivot tables to a SharePoint server. Someone without Excel can open your workbook in a web page and use the slicers to filter the data. Those people will have a nice gallery of report thumbnails from which to choose. The people in IT will get to monitor which reports are actually used and by whom.

Drawbacks to Using PowerPivot

As you start using PowerPivot, you might run into a few annoyances.

No Grouping—PowerPivot cannot use the Group feature of pivot tables. I use this feature a lot to roll daily dates up to months, quarters, and years. You can work around this by using the DAX language to define year, quarter, and month columns, but it is not as simple as using the Group feature.

You lose Undo—PowerPivot is an add-in. Traditionally, when you run a macro or some external code, the Undo stack is cleared. Thus, anytime you deal with PowerPivot, you lose the ability to Undo anything before you go in to PowerPivot.

No VBA—You can automate regular pivot tables with VBA. You cannot use VBA to control PowerPivot.

No Drilldown—Usually, you can double-click a cell in a pivot table and see the rows that make up that cell. This feature is not in the first version of PowerPivot.

Excel 2010 only—PowerPivot works only with Excel 2010. You cannot use it with Excel 2007. You cannot use it with files stored in compatibility mode.

Installing PowerPivot

The main trick here is to get the PowerPivot add-in that matches your version of Office. Office 2010 ships in 32-bit and 64-bit versions. If you have a new computer running 64-bit Windows, it is possible that you have 32-bit or 64-bit office.

Note

image

If you plan to deal with millions of records, you want to go with the 64-bit versions of Office and PowerPivot. You are still constrained to a 2GB file size limit, but because PowerPivot can compress data, you can fit 10 times that amount of data in a PowerPivot file. The 64-bit version of Office can make use of memory sizes beyond the 4GB limit in 32-bit Windows.

Go to the File menu in Excel 2010 and select Help. The right side of the Backstage view shows a version number. If the version number ends with (64-bit), you need the 64-bit version of the add-in.

After installing the add-in, you should see a PowerPivot tab on the Excel 2010 Ribbon, as shown in Figure 25.1.

Figure 25.1. After successful installation, you have a PowerPivot tab in the Ribbon.

image

Case Study: Building a PowerPivot Report

This case study walks you through your first PowerPivot data mash-up. In this example, you create a report that merges a 1.8 million row CSV file with a store identifying data in Excel.

Note

image

This will not work for PowerPivot. You need to get rid of those extraneous rows at the top of the data set.

Your main table is a 1.8 million record CSV file called demo.txt. This file is shown in Notepad in Figure 25.2. It is important that you have column headings in row 1 of the CSV file. The point-of-sale vendor who provides this data usually had a “Run on mm/dd/yyyy” row at the top of the file, a blank row, and then headings in row 3.

Figure 25.2. This 1.8 million row file is too big for Excel.

image

Import a Text File

To import the 1.8 million row file into PowerPivot, follow these steps.

  1. Select the PowerPivot tab in Excel 2010.
  2. Select the PowerPivot Window icon. A new PowerPivot application window appears. PowerPivot offers two Ribbon tabs; Home and Design. The Home tab is shown in Figure 25.3.

    Figure 25.3. The Home tab of the PowerPivot application.

    image

  3. You want to import your main table first. This is the large CSV file shown in Figure 25.2. From the Get External Data group, select From Text. PowerPivot shows the Table Import Wizard.
  4. Select a Friendly Connection Name, such as Sales History. Click the Browse button and locate your text file. PowerPivot does not default to see the first row as column headers, so the data preview offers five unfriendly column names of F1, F2, F3, and so on (see Figure 25.4).

    Figure 25.4. Initially, the headers are not recognized.

    image

  5. Verify that your delimiter is a comma. The drop-down offers standard delimiters, such as comma, semicolon, vertical bar, and so on.
  6. Select the check box for Use First Row as Column Headers. The preview now shows the real column names.
  7. If there are any columns that you don’t need to import, clear them. The entire file is going to be read into memory. If you have extraneous columns, particularly columns with long text values, you can save memory by clearing them. Figure 25.5 shows the data preview with Units cleared.

    Figure 25.5. Choose which columns to import.

    image

  8. Note that there are filter drop-downs for each field. You can sort and filter this 1.8 million row data set here, although it will be slower than in a few steps from now. If you open a filter field, you can choose to exclude certain values from the import.
  9. Click Finish, and PowerPivot begins loading the file into memory. The wizard shows how many rows have been fetched so far (see Figure 25.6).

    Figure 25.6. In less than a minute, PowerPivot is up to 1.5 million rows.

    image

  10. When the file is imported, the wizard confirms how many rows have been imported, as shown in Figure 25.7. Click Close to return to the PowerPivot window.

    Figure 25.7. Success.

    image

  11. The 1.8 million row data set is shown in the PowerPivot window. Go ahead. Grab the vertical scrollbar and scroll through the records. You can also Sort, change the number format, or filter (see Figure 25.8).

    Figure 25.8. 1.8 Million records are in a grid that feels a lot like Excel.

    image

Note

image

Note that although this feels like Excel, it is not Excel. You cannot edit an individual cell. If you add a calculation in what amounts to cell E1, that calculation will automatically get copied to all rows. If you format the revenue in one cell, all the cells in that column will get formatted.

You can change column widths by dragging the border between the column names just like in Excel.

The Filters in PowerPivot are not as powerful as the new filters introduced in Excel 2007. In particular, the date columns do not show a hierarchical filter where you can choose a year or month.

If you right-click a column, a menu appears where you can rename, freeze, copy, hide, and unhide the columns (see Figure 25.9).

Figure 25.9. Right-click a column to rename it.

image

The bottom line is that you have 1.8 million records you can sort, filter, and later, pivot. This is going to be cool.

Add Excel Data by Copying and Pasting

The file imported previously has only StoreID as a field. It does not have the store name or location. But you probably have a small Excel file that maps StoreID to the store name and other relevant data. You can add this data as a new tab in PowerPivot. Follow these steps:

  1. Open this workbook in Excel.
  2. Select the data with Ctrl+*.
  3. Copy it with Ctrl+C.
  4. Click the PowerPivot tab. On the left side of the Ribbon is an icon to return to PowerPivot (see Figure 25.10).

    Figure 25.10. Copy Excel data.

    image

  5. Click the PowerPivot window icon. PowerPivot returns and you see your 1.8 million row data set.
  6. Click the Paste icon on the left side of the PowerPivot Home tab. You will see a Paste Preview window.
  7. Give the new table a better name than “Table”: perhaps StoreInfo (see Figure 25.11). Click OK.

    Figure 25.11. Give the pasted table a name.

    image

You now see the store information in a new StoreInfo tab. Notice that there are now two worksheet tabs in PowerPivot, as shown in Figure 25.12.

Figure 25.12. You now have two unrelated tables in the PowerPivot window.

image

Add Excel Data by Linking

In the previous example, you added the StoreInfo table by using Copy and Paste. This creates two copies of the data. One is stored in an Excel worksheet somewhere, and the other is stored in the PowerPivot window. If the original worksheet changes, those changes will not make it through to PowerPivot. An alternative is to link the data from Excel to PowerPivot.

To link to Excel data, that data must be converted to the Table Format introduced in Excel 2007.

  1. If you start with an Excel worksheet, make sure that you have single-row headings at the top, no blank rows or blank columns.
  2. Select one cell in the worksheet and press Ctrl+T. Excel asks you to confirm the extent of your table and if your data has headers (see Figure 25.13).

    Figure 25.13. Convert your regular Excel data to a Table.

    image

  3. The table gets a default format. You can use the Table Tools Design tab to change that format if the dark blue banded rows are too much for you.
  4. Go to the Table Tools Design tab. On the left side of the Ribbon, you see that this table is called Table1. Type a new name, such as StoreInfo.
  5. On the PowerPivot tab, select Create Linked Table, as shown in Figure 25.14.

    Figure 25.14. Use the Create Linked Table to get this data into PowerPivot.

    image

The table appears in the PowerPivot window.

Define Relationships

Normally, in regular Excel you would be creating VLOOKUPs to match the two tables. It is far easier in PowerPivot. Follow these steps:

  1. You link from one column in your main table to a column in another table. To simplify the relationship process, navigate to your main table and select a cell in the column from which you will link.
  2. Click the Design tab in the PowerPivot Ribbon.
  3. Select Create Relationship. The Create Relationship dialog appears. By default, the selected table and column appear in the first two fields, as shown in Figure 25.15.

    Figure 25.15. Define a relationship between tables. By selecting the key column before starting, two of the four fields are populated.

    image

  4. If you skipped step 1 and the correct table is not shown in the Table drop-down, select Demo from the Table drop-down.
  5. If you did not select the correct column in step 1, open the Column drop-down. Select StoreID.
  6. Open the Related Lookup table drop-down. Select StoreInfo.
  7. Because the column names match, PowerPivot automatically changes the Related Lookup Column to read StoreID, as shown in Figure 25.16.

    Figure 25.16. This simple dialog replaces the VLOOKUP.

    image

  8. Click Create. You’ve now created a relationship between the two tables.

    image To see a demo of defining relationships in PowerPivot, search for “Excel In Depth 25” at YouTube.

Add Calculated Columns Using DAX

One downside to pivot tables created from PowerPivot data is that they cannot automatically group daily data up to years. Before building the pivot table, let’s use the DAX formula language to add a new calculated column to the Demo table.

Follow these steps:

  1. Select PivotTable. You now see the PowerPivot tab back in the Excel window.
  2. Select to put the pivot table on a new worksheet (see Figure 25.20).

Figure 25.21 shows the initial screen. There are many things to notice.

Follow these steps to add a Year field to the Demo table:

  1. Click the Demo worksheet tab at the bottom of the PowerPivot Window.
  2. The column to the right of Revenue has a heading of Add Column. Click in the first cell of this blank column.
  3. Click the fx icon to the left of the formula bar. The Insert Function dialog appears with categories for All, Date & Time, Math & Trig, Statistical, Text, Logical, and Filter. Select Date & Time from the drop-down. You instantly notice that this is not the same list of functions in Excel. Five of the first six functions that appear in the window are exotic and new (see Figure 25.17).

    Figure 25.17. DAX offers a different list of functions than Excel does.

    image

  4. Luckily, some familiar old functions are in the list as well. Scroll down and select the YEAR function. Click the first date in the Date column. PowerPivot proposes a formula of =year(demo[Date]. Type the closing parentheses and press Enter. Excel fills in the column with the year associated with the date, as shown in Figure 25.18.

    Figure 25.18. A new calculated column is added. You want to rename this.

    image

  5. Right-click the column and select Rename Column. Type a name, such as Year.

There are many more columns that you might think of adding, but let’s move on to using the pivot table.

Build a Pivot Table

One of the advantages of PowerPivot is that multiple tables can share the same data and slicers. Open the PivotTable drop-down on the Home tab of the PowerPivot Ribbon. As shown in Figure 25.19, you have choices for a single pivot table, a single chart, a chart and a table, two charts, and so on.

Figure 25.19. You have many options beyond a single table or chart.

image

image To learn how to deal with two or more pivot charts, see the “Combination Layouts” section later in this chapter.

The PowerPivot Field List (see Figure 25.21) is a third variation of the pivot table field list. It is actually a new entry in the Task pane.

Figure 25.20. Select the location for the pivot table.

image

Figure 25.21. The PowerPivot field list is different from the regular pivot table field list.

image

• Both tables are available in the top of the Field List. The main table is expanded to show the field names, but you can expand the other table and add those fields to this pivot table.

• Two new sections in the drop zones offer vertical or horizontal slicers.

• Because you are in a pivot table, the PivotTable Tools tabs are available.

3. Select Revenue from the PowerPivot Field List. Expand the StoreInfo table. Select Region from the StoreInfo table. Excel builds a pivot table showing sales by region (see Figure 25.22). At this point, you have a pivot table from 1.8 million rows of data with a virtual link to a lookup table.

Figure 25.22. This pivot table summarizes 1.8 million rows and data from two tables.

image

At this point, you might want to go to the PivotTable Tools tabs to further format the pivot table. You could apply a currency format and rename the Sum of Revenue field. Choose a format with banded rows, and so on.

To show off some more features of the PowerPivot pivot table, let’s add some slicer functionality.

Slicers in PowerPivot

Slicers are new in Excel 2010. The slicers in PowerPivot are slightly different from slicers in regular Excel.

First, notice that the PowerPivot Field List offers boxes for Slicers Vertical and Slicers Horizontal. Vertical slicers will be placed to the left of your pivot table. They are great for long lists that might need a scrollbar. Horizontal slicers will go above your pivot table.

Drag the Year field to the Slicers Horizontal drop zone. The years appear in a small slicer surrounded by a big box, as shown in Figure 25.23. Your first reaction will be to make that big box smaller, but don’t.

Figure 25.23. Microsoft draws a big box around a small slicer.

image

  1. That big box is the slicer parent control. It is actually a drawing object that defines the boundary for all the horizontal slicers. If you make the slicer parent control box small, there won’t be room for additional slicers.
  2. Add Division and Era to the Slicers Horizontal.
  3. All of a sudden the box around the three slicers looks almost the right size. It is as if Microsoft knew that you were going to add two more slicers!
  4. Add Mall Developer to the vertical slicer. Because it has a long list of relatively long names, it fits well as the only vertical slicer.
  5. Slicers work the same as they do in regular Excel pivot tables. Click one item to select it. Ctrl+click additional items to select them as well.

Figure 25.24 shows the default slicers after applying a few filters.

Figure 25.24. Microsoft chooses the number of columns for each slicer.

image

You will probably like the PowerPivot slicers better than regular Excel slicers. The PowerPivot spec calls for some IntelliSense to choose how many columns might work for each slicer. The fact that the PowerPivot takes a guess at arranging the slicers means that you might not have to adjust the slicers. In regular Excel, you will find yourself always adjusting the slicers.

Tip

image

To format the slicers, you have to click the slicer, not the bounding box.

The slicer parent control box disappears after you click outside of the pivot table. It comes back when the PowerPivot Field List displays. You can resize that box if you want the slicers to take up more or less room. Click the box once and resizing handles appear.

Some Things Are Different

If you have spent your whole Excel life building pivot tables out of regular Excel data, you are going to find some annoyances with these PowerPivot pivot tables. Many of these issues are not because of PowerPivot. They are because any PowerPivot pivot table automatically is an OLAP pivot table. This means that it behaves like an OLAP pivot table.

Some items to note:

• Days of the week do not automatically sort into the proper sequence. You have to choose More Sort Options, Ascending, More Options.... Uncheck the AutoSort box. Open the First Key Sort Order dropdown and choose Sunday, Monday, Tuesday.

• There is a trick in regular Excel pivot tables that you can do instead of dragging field names to the right place. You can go to a cell that contains the word Friday and type Monday there. When you press Enter, the Monday data will move to that new column. This does not work in PowerPivot pivot tables!

• Decide between Compact, Tabular, and Outline layouts before you drag fields to the correct location. Otherwise, everything snaps back to the original sequence.

• The PowerPivot Field List looks like the regular Field List, but a lot of functionality is missing. You cannot access filters by hovering over fields in the top of the field list. You cannot rearrange the field list. If you add multiple fields to the Values drop zone, you cannot access the Values field to move it to a new position. The good news is that the real PivotTable Field List is available. Turn it back on using PivotTable Tools Options, Field List. You can then move the Values field to the proper location.

• When you enter a formula in the Excel interface, you can point to a cell to include that cell in the formula. You can do this using the mouse or the arrow keys. Apparently, the PowerPivot team is made up of mouse people, because they support building a formula using the mouse in PowerPivot. Old-time Lotus 1-2-3 customers who build their formulas using arrow keys will be disappointed to find that the arrow key method doesn’t work.

• You can use two types of refresh with these pivot tables. If you go to the PowerPivot tab and Update All, new data will be read from the data sources. This does not automatically refresh the pivot tables. You then have to go to the PivotTable Tools Options tab to click Refresh.

Certainly the Customer Experience Improvement Program data shows that everyone immediately goes from one Refresh to the other Refresh in quick succession. I would complain loudly that these are two separate steps, but remember that we are dealing with two different products here, so I can live with it.

Two Kinds of DAX Calculations

You’ve already seen an example where you used a DAX function to add a calculated column to a table in the PowerPivot window. There are 81 functions that are mostly copied straight from Excel for doing these types of calculations. The RELATED function can also be used in a calculated column to grab a value from a different table.

DAX can also be used to create new measures in the pivot table. These functions do not calculate a single cell value. They are all aggregate functions that calculate a value for the filtered rows behind any cell in the pivot table. There are 54 new DAX functions to enable these calculations. The real power is in these functions.

DAX Calculations for Calculated Columns

You’ve already seen one example of a calculated column. The functions are remarkably similar to the same function in Excel and mostly won’t require a lot of explanation.

In the Date & Time category are 17 functions. The first 16 are identical to Excel’s function. The rarely documented DATEDIF function in Excel is now renamed as YEARFRAC and is rewritten to actually work.

DATE(<year>, <month>, <day>)

DATEVALUE(date_text)

DAY(<date>)

EDATE(<start_date>, <months>)

EOMONTH(<start_date>, <months>)

HOUR(<datetime>)

MINUTE(<datetime>)

MONTH(<datetime>)

NOW()

SECOND(<time>)

TIME(hour, minute, second)

TIMEVALUE(time_text)

TODAY()

WEEKDAY(<date>, <return_type>)

WEEKNUM(<date>, <return_type>)

YEAR(<date>)

YEARFRAC(<start_date>, <end_date>, <basis>)

In the Information category are six functions from Excel:

ISBLANK(<value>)

ISERROR(<value>)

ISLOGICAL(<value>)

ISNONTEXT(<value>)

ISNUMBER(<value>)

ISTEXT(<value>)

In the Logical Functions are seven Excel functions. Don’t be concerned that SUMIFS is not in this list. See a discussion on CALCULATE later.

ISBLANK(<value>)

ISERROR(<value>)

ISLOGICAL(<value>)

ISNONTEXT(<value>)

ISNUMBER(<value>)

ISTEXT(<value>)

Math and Trig offer 22 familiar functions:

ABS(<number>)

CEILING(<number>, <significance>)

EXP(<number>)

FACT(<number>)

FLOOR(<number>, <significance>)

INT(<number>)

LN(<number>)

LOG(<number>,<base>)

LOG10(<number>)

MOD(<number>, <divisor>)

MROUND(<number>, <multiple>)

PI()

POWER(<number>, <power>)

QUOTIENT(<numerator>, <denominator>)

RAND()

RANDBETWEEN(<bottom>,<top>)

ROUND(<number>, <num_digits>)

ROUNDDOWN(<number>, <num_digits>)

ROUNDUP(<number>, <num_digits>)

SIGN(<number>)

SQRT(<number>)

TRUNC(<number>,<num_digits>)

In the Statistical category are 10 functions:

AVERAGE(<column>)

AVERAGEA(<column>)

COUNT(<column>)

COUNTA(<column>)

COUNTBLANK(<column>)

MAX(<column>)

MAXA(<column>)

MIN(<column>)

MINA(<column>)

SUM(<column>)

There are some items to note in the Text category.

First, you will have to start using CONCATENTATE or the & operator to build concatentations in PowerPivot. Because PowerPivot cannot join two tables based on two fields in each table, you will find that you are using CONCATENATE frequently to join fields together.

Second, the Excel TEXT function has been renamed to FORMAT. It still works the same. Apparently, the PowerPivot team wanted a more descriptive explanation of what TEXT actually does.

CODE(<text>)

CONCATENATE(<text1>, <text2>,...)

EXACT(<text1>,<text2>)

FIND(<find_text, within_text, start_num)

FIXED(<number>, <decimals>, <no_commas>)

FORMAT(<value>, <format_string>)

LEFT(<text>, <num_chars>)

LEN(<text>)

• LOWER(<text>)

MID(<text>, <start_num>, <num_chars>)

REPLACE(<old_text>, <start_num>, <num_chars>, <new_text>)

REPT(<text>, <num_times>)

RIGHT(<text>, <num_chars>)

SEARCH(<search_text>, <within_text>, [start_num])

SUBSTITUTE(<text>, <old_text>, <new_text>, <instance_num>)

TRIM(<text>)

UPPER (<text>)

VALUE(<text>)

Most of the items in the Filter and Value Functions category are used for creating new measures. Two items are of use in calculated columns.

First, DAX introduces the BLANK() function. Because some of the aggregation functions can base a calculation on either the ALLNONBLANKROW or FIRSTNONBLANK, you can use the BLANK() function in an IF() function to exclude certain rows from measure calculations.

Second, you have the RELATED() function, which is described next.

Using RELATED() to Base a Column Calculation on Another Table

The next several examples make use of a sample file called Ch10WeatherMashup.xlsx. This file started out with a generic sales by day by store data set. A company sells products both in a mall location and at an airport. The two stores are less than 10 miles apart but might show different sales trends.

I used a web query and a macro to download weather data for the 3 years of daily dates. Several examples in the DAX Measure calculation mash up the sales and weather data to look for trends.

While preparing this workbook, I used DAX to create some calculated columns:

• WeekdayName uses =FORMAT(Sales[Date], "dddd") to convert the date to the day of the week. I was disappointed that PowerPivot pivot tables would not respect the custom list when presenting weekdays, so I added a new WeekdayID column.

• WeekdayID uses =WEEKDAY(Sales[Date],2). The “2” argument matches the same function in Excel. “2” numbers the days starting with 1 for Monday through 7 for Sunday. The Weekday table is a 7-row table that maps 1 to “1-Mon” and so on. This way, the day names would sort correctly.

• LocationDays is =CONCATENATE(Sales[Location],Sales[Date]). You see this used later when calculating the distinct number of store days.

Next, I wanted to calculate sales per store associate. The two locations have different scheduling requirements. The airport location usually operates with one associate, but staffs up to two on the busy travel days of Sunday, Monday, and Friday. The mall location staffs up on Friday and Saturday. To calculate the number of staff on a given day, you need to concatenate Location and Weekday. The formula for LocationWeek is =Concatenate(Sales[Location],Sales[WeekdayName]). Note that calculated columns have no problem referring to other calculated columns.

A relationship links this column to the Staffing table.

You might think that because PowerPivot understands the relationship between the Sales table and the Staffing table, you could write a formula such as =Sales[Net Sales]/Staffing[Staff Level], as shown in Figure 25.25. Unfortunately, this evaluates to an error.

Figure 25.25. Calculated columns don’t automatically use the defined relationship.

image

The problem is that the calculation is trying to divide this row’s sales of 2202 by all 14 values in the Staffing table.

The solution is to use the Related function. Rewrite the formula as =Sales[Net Sales]/ Related(Staffing[StaffLevel]). The related function tells DAX that you don’t want to divide 2202 by all 14 values in the table but only by the one value that is related to AirportSunday.

Figure 25.26 shows the result. In the first row, the airport location had two people staffing the store. Thus, the sales per person is half of the 2202. On Tuesday, only one person staffs the store, so SalesPerPerson is the same as Net Sales.

Figure 25.26. The Related function is like a 1-argument VLOOKUP.

image

Using these calculated columns and relationships, you can create some interesting pivot tables.

Figure 25.27 shows an analysis of sales by weekday at the two locations. You can see in cell I10 that sales peak at the Mall on Saturday. At the airport, you would think that sales would peak on Friday when business travelers need a gift on their way back home. Instead, they peak on Sunday when business travelers are purchasing new items for their upcoming business meetings.

Figure 25.27. Sales at the airport location peak on Sunday.

image

The percentages in rows 9 and 10 of Figure 25.27 work out okay because over 3 years, there are roughly the same number of Mondays as Fridays in the data set.

As you continue to do other analyses, the results are not as meaningful. In Figure 25.28, a report of sales by the amount of rain shows that most sales happened on sunny days. But this could just be telling you that it is sunny a lot more in Florida than rainy.

Figure 25.28. Do people hit the mall more often on sunny days? You cannot tell from this report.

image

The answer is to use DAX to create new measures that calculate sales per store per day.

Using DAX to Create New Measures

A Measure is the OLAP term for a Calculated Field.

But DAX Measures can run circles around Calculated Fields.

Before you dive in, you need to remember one mantra: “Filter first, then calculate.” To understand this mantra, consider cell C15 in Figure 25.29.

Figure 25.29. How many filters are on cell C15?

image

Think about how many filters are applied to cell C15. Would you say two? I think that the answer is four.

Everyone would agree that cell C15 is filtered to show only records that fell on a Saturday with high temperatures in the 80s. That is two filters.

In addition, the row and column fields are really filters as well. For cell C15, you want only records with sales at the airport location. You also only want days where there was no rain. That is two additional filters for cell C15.

As you start to think about DAX measures, remember that to figure out the measure for a particular cell in the pivot table, they first filter and then calculate the result using the DAX formula.

Count Distinct Using DAX

DAX lets you count how many distinct values meet the filter.

Wait, that is so good; I am going to repeat it.

DAX lets you count how many distinct values meet the filter! Do you understand the gravity of that statement? People who create advanced pivot tables always get tripped up because pivot tables cannot come up with a distinct count of something.

In Chapter 27, Automating Repetitive Functions Using VBA Macros,” I offer a ridiculous formula =1/COUNTIFS(...) to try to replicate the Count Distinct. DAX will now let you count how many distinct values meet the filter.

To create a new Measure in DAX, use the New Measure icon in the PowerPivot tab. It seems confusing, so I will point out that this is not a tab in the PowerPivot window; it is the PowerPivot tab on the Excel Ribbon. To be clear, click the New Measure icon shown in Figure 25.30.

Figure 25.30. New Measure is on the Excel Ribbon on the PowerPivot tab.

image

When you click New Measure, you get the Measure Settings dialog, as shown in Figure 25.31.

Figure 25.31. Edit measures in this dialog.

image

  1. The Table Name should be the base table where your main numerical data is located. Change the first drop-down from Weather to Sales.
  2. For the Measure Name, use a name such as DayCount.
  3. Use the same name for Custom Name.

Measures are always aggregate functions, not cell-level functions. Thus, you must use an aggregate function such as SUM or COUNTROWS.

The magic function here is Distinct(Sales[Date]). For any cell in the pivot table, the distinct function returns a list of the distinct values for the rows that match the filter. Note that distinct must be used on a column in the home table. It cannot be applied to a value in a linked table. It is affected by filters applied to linked tables, but you must be using a value in the home table.

  1. Distinct returns a one column table with a list of the distinct values. To count how many items there are, use =CountRows(Distinct(Sales[Date])).
  2. After typing the formula, click the Check Formula button to make sure that your syntax is correct (see Figure 25.32).

    Figure 25.32. Build a formula and then check the syntax.

    image

Think of DayCount as an intermediate result to illustrate the concept of using Distinct. You could then define a measure of Net Sales divided by DayCount. However, you could also skip DayCount altogether and build Sales Per Day with a single formula:

=SUM(Sales[Net Sales])/COUNTROWS(Distinct(Sales[Date]))

Figure 25.33 shows Sales Per Day based on the amount of rain and the location.

Figure 25.33. Both locations do better on sunny days.

image

You might have thought that sales would pick up in the airport on rainy days because of rain delays. Apparently, people are too stressed out to shop when their flights are delayed.

The airport location was open for all of the 3 years. The mall location opened late in 2006, so there are fewer days for the mall location. The airport is open on Christmas, but the mall is not. Thus, there are many days where only one store was open.

Column I shows total sales of both stores. The DayCount in column J counts a day when either one store or the other was open. Thus, both stores did sell $6.3 million over the course of the data set. But because both stores were not open for the entire period, the calculation of $6.3 million divided by 1086 days is wrong.

Caution

image

Something is wrong with the totals in Figure 25.33. The Grand Total row for the airport is accurate; there were really an average of $2,665 in sales per day at the airport. But if the airport is averaging $2,665 a day and the mall is averaging $4,526 a day, how can the Grand Total column be showing $5,833?

Luckily, you have the intermediate DayCount column available. Figure 25.34 shows a test report showing sales, days, and sales per day.

Figure 25.34. Something is wrong in J12.

image

The solution is to count the distinct number of a concatenated column of location and date. The Location Days column is a calculated column in the PowerPivot window. To see it, turn back to Figure 25.25. In Figure 25.35, two new measures appear:

LocationDayCount =CountRows(Distinct(Sales
[LocationDays]))
Sales Per Store Per Day =Sum(Sales[Net Sales])/
CountRows(Distinct(Sales[LocationDays]))

Figure 25.35. This calculation works better.

image

Although these new measures produce the same results for the Airport or Mall Sales Per Day; the improvement is that the final column in K shows the true average sales per store per day.

When “Filter, Then Calculate” Doesn’t Work in DAX Measures

Suppose that you create a DAX Measure for SUM(Sls[Sales]). By definition, all filters are taken into account before PowerPivot starts calculating. To get the 851 in cell D6 in Figure 25.36, the program will filter the data to where Rep=Bill, Date = 6/2/2011. The remaining rows are added to get the 851.

Figure 25.36. Excel applies a filter to calculate the 851.

image

In essence, without the DAX calculation specifying any filters, the result for cell D6 in the pivot table is automatically going to sum up the rows shown by the arrows in Figure 25.37.

Figure 25.37. The answer in Figure 25.36 adds up all the arrow rows.

image

Incidentally, the formula in E12 of Figure 25.37 is =SUMIFS(Sls[Sales],Sls[Rep], "Bill",Sls [Date],DATE(2011,6,2)) and also returns 851. The DAX formula of =SUM(Sls[Sales]) is a bit shorter, isn’t it?

You should appreciate that for most calculations, you never have to specify a filter.

This assumption leads to trouble, though, when you need part of your formula to look at all rows, not just the filtered rows.

Suppose that you want to see how the sales compared to the total sales for the month. You need to calculate a fraction. The numerator of the fraction is going to be SUM(Sls[Sales]). The denominator of the fraction needs to be all the records in the sales table. That is going to be tougher.

In DAX, you don’t use SUMIFS; you use Calculate. Calculate asks for an expression and then one or more filters. For those filters, you use a special function called ALL. If you ask for Calculate (Sum(Sls[Sales]),ALL(Sls)), the filter is almost an “anti-Filter.” Rather than further limiting the calculation, ALL says that you want it to look not just at Bill’s sales for 6/2/2011 but all the sales in the table.

In Figure 25.38, a new measure calculates % of Grand Total sales by using =SUM(Sls[Sales])/ Calculate(Sum(Sls[Sales]),All(Sls)). The % of Grand Total for cell F7 says that Bill’s $851 in sales on June 2 represents 0.9% of the grand total sales.

Figure 25.38. The denominator of these calculations is always the tough part.

image

The past example might seem trivial, because you could replace that calculation with Show Values As, % of the Total. But don’t skip over understanding the ALL(Sls) syntax. After you understand that syntax, you can replace the first argument in Calculate with Max, Min, Average, or any function. So Calculate becomes like SUMIFS, AVERAGEIFS, MINIFS, MAXIFS, and so on.

Early on in the days of the PowerPivot beta, many people blogged an example or two that calculated the % of the total using Calculate and All. I don’t think that this is the most powerful use of calculated measures.

Suppose that you want to calculate how Bill’s $851 sale on June 2 compared to all sales on June 2. The numerator of the DAX Measure will be =Sum(Sls[Sales]). Again, the denominator is going to be the hard part.

For the denominator, you want to say, “Look at all the sales that match today, but don’t pay any attention to the sales rep filter. Give me all sales reps.”

If you asked for ALL(Sls), the calculation would throw out all the filters.

This time, you need to ask for AllExcept(Sls,Sls[Date]). I can’t even count if this is a quadruple or a quintuple negative, but I can tell you that it takes some time to wrap your head around this. You might think of it like this: “DAX is already going to be filtering by date and sales rep. Go ahead and throw out all the filters except for the Date filter. Keep filtering by date.”

In Figure 25.39, the DAX Measure calculates =SUM(Sls[Sales])/Calculate(Sum(Sls[Sales]),AllExcept(Sls,Sls[Date])). This calculation shows you the percentage of each day’s sales achieved by a certain rep.

Figure 25.39. AllExcept says to ignore all filters except the Date filter.

image

You can also override the filters by specifying other filters in the Calculate function. The actual syntax of the Calculate function is: Calculate(Expresion,[filter 1], [filter 2], [filter 3], [filter 4], ...). Just like SUMIFS in Excel 2010, you can keep adding additional filters.

=Calculate(Sls[Sales],Sls[Rep]="Amber") will get all Amber sales for this row. If Amber is the sales star in the store, perhaps someone would want to show everyone’s sales as a percentage of Amber’s Sales.

=SUM(Sls[Sales])/ Calculate(Sls[Sales],Sls[Rep]="Amber") shows sales as a percentage of Amber’s total sales for that day. See the report in Figure 25.40. I wonder if the store manager posted this report if Chris would call home on the 16th to say, “Guess what! I sold 259.2% of that suck-up Amber.

Figure 25.40. Show all values as a % of Amber is not one of the built-in selections.

image

Mix In Those Amazing Time Intelligence Functions

Remember that you can apply many filters in the Calculate function. As you saw in the previous example, you could filter to show total sales for the rep named Amber. You can also filter to all dates that match a certain date function.

There are 34 Time Intelligence functions. Suppose that you want to calculate a running MTD total. You can use the Calculate function and specify a filter of DatesMTD(Sls[Date]). That’s the complete filter. You don’t need to say, “This row’s date falls within the MTDDates compared to the current date in the report.” You just have to say DatesMTD(Sls[Date]).

In pseudo code: “I want a formula that will add up all the sales for dates that fall in the MTD period compared to the current row but only for reps that match the current column.”

Add up all the sales is =Calculate(Sum(sls[Sales]).

For dates that are MTD is DatesMTD(Sls[Date]).

But only for reps that match is AllExcept(Sls,Sls[Rep]).

The complete formula in Figure 25.41 is =Calculate(Sum(sls[Sales]), DatesMTD(Sls[Date]),AllExcept(Sls,Sls[Rep])).

Figure 25.41. MTD Sales for each rep.

image

Notice that Dale did not work on June 4 and thus gets no calculation for MTD Sales on that date. If you use pivot tables a lot, you understand that empty cells usually show up as blank, and you generally change the pivot table options to have empty cells show up as 0. The same thing is happening here. Because there was no data for Dale for June 4, none of the MDX measures gets calculated for that cell. The solution would be to go back to the original data and add in a zero-sale record for every person for every day.

If you download the sample files for this book and look at Ch10DAXMeasures.xlsx, you see other calculated measures in the field list.

Tip

image

The filtering techniques such as All() and AllExcept() work well in the home table. If you need to have a calculation ignore or respect a filter in a linked table, your results may vary. It is always best to have the filter fields in the home table. One workaround is to use the =Related() function in the PowerPivot window to bring a copy of the field from the linked table into the home table.

Note that a measure can refer to another measure. To get to This Rep’s Percentage of MTD Sales versus all MTD Sales, I first built MTDThisRep as shown previously. I then built MTD All Reps using =Calculate(Sum(Sls[Sales]),DatesMTD(Sls[Date]),All(Sls)). When it came time to create the formula to divide those two, it was easiest to simply build the formula as =Sls[MTDThisRep]/Sls[MTD All Reps]), as shown in Figure 25.42.

Figure 25.42. You can use previously defined measures to simplify the calculation for another measure.

image

The following is a complete list of Time Intelligence Functions.

CLOSINGBALANCEMONTH(<expression>,<dates>,<filter>)

CLOSINGBALANCEQUARTER(<expression>,<dates>,<filter>)

CLOSINGBALANCEYEAR(<expression>,<dates>,<filter>)

DATEADD(<date_column>,<number_of_intervals>,<interval>)

DATESBETWEEN(<column>,<start_date>,<end_date>

DATESINPERIOD(<date_column>,<start_date>,<number_of_intervals>,<intervals>)

DATESMTD(<date_column>)

DATESQTD (<date_column>)

DATESYTD (<date_column> [,<YE_date>])

ENDOFMONTH(<date_column>)

ENDOFQUARTER(<date_column>)

ENDOFYEAR(<date_column>)

FIRSTDATE (<datecolumn>)

LASTDATE (<datecolumn>)

LASTNONBLANK (<datecolumn>,<expression>)

NEXTDAY(<date_column>)

NEXTMONTH(<date_column>)

NEXTQUARTER (<date_column>)

NEXTYEAR(<date_column>[,<YE_date>])

OPENINGBALANCEMONTH(<expression>,<dates>,<filter>)

OPENINGBALANCEQUARTER(<expression>,<dates>,<filter>)

OPENINGBALANCEYEAR(<expression>,<dates>,<filter>)

PARALLELPERIOD(<date_column>,<number_of_intervals>,<intervals>)

PREVIOUSDAY(<date_column>)

PREVIOUSMONTH(<date_column>)

PREVIOUSQUARTER(<date_column>)

PREVIOUSYEAR(<date_column>)

SAMEPERIODLASTYEAR(<dates>)

STARTOFMONTH (<date_column>)

STARTOFQUARTER (<date_column>)

STARTOFYEAR(<date_column>[,<YE_date>])

TotalMTD(<expression>,<dates>,<filter>)

TotalQTD(<expression>,<dates>,<filter>)

TotalYTD(<expression>,<dates>,<filter>)

Check out the optional Year Ending date parameter in DatesYTD and other arguments; yes, you can finally deal with fiscal years other than those ending on December 31!

For completeness, these are the remaining functions supported by DAX:

BLANK()

RELATED(<column>)

ALL(<table_or_column>)

DISTINCT(<column>)

EARLIER(<column>, <number>)

EARLIEST(<table_or_column>)

VALUES(<column>)

ALLEXCEPT(<table>,column1>,<column2>,...)

CALCULATE(<expression>,<filter1>,<filter2>...)

CALCULATETABLE( <expression>, <filter1>, <filter2>,...)

FILTER(<table>,<filter>)

RELATEDTABLE(<table>)

ALLNONBLANKROW(<column>)

FIRSTNONBLANK(<column>,<expression>)

AVERAGEX(<table>, <expression>)

COUNTAX(<table>, <expression>)

COUNTROWS(<table>)

COUNTX(<table>, <expression>)

MAXX(<table>, <expression>)

MINX(<table>, < expression>)

SUMX(<table>, <expression>)

Other Notes

The topic of PowerPivot deserves a whole book. (In fact, you can find my whole book on the subject: PowerPivot for the Excel Data Analyst, published by QUE.)

The next section cover a few miscellaneous topics that didn’t make it elsewhere in this chapter.

Combination Layouts

The PivotTable drop-down in the PowerPivot Window offers eight choices.

The first choice is a single pivot table and has been used throughout this chapter.

The last choice is a flattened pivot table. That is a pivot table that starts in Outline layout instead of Compact layout. The Repeat All Row Labels feature is turned on. If you plan to convert the pivot table to values to reuse it, choosing a flattened pivot table can save you a few clicks along the way.

The other six layouts include pivot charts. I don’t get this. PivotCharts look great in Microsoft demos, but no one actually uses them. I can see why Microsoft had to put them here, because it will give them something to demo, but I cannot figure out why Microsoft gives you six different versions. If one PivotChart is bad, why would anyone ever want four of them?

But, assume you are actually trying to do this and you found this section in the index.

When you choose a combination of multiple elements, you have multiple outlines on the worksheet. In Figure 25.43, the pivot table on the right is the active table. You can tell because the cell pointer is inside the outline for that pivot table. Any changes that you make to the PowerPivot Field list affect that pivot table first.

Figure 25.43. A combination report offers one or more pivot tables or charts.

image

When you are ready to work on another element in the combination, click that element. The Field List resets to blank, and you can design that element. All elements share the same slicers.

Note that for each chart on your layout, Microsoft inserted a new worksheet to hold the actual pivot table for the chart.

Report Formatting

I am excited about PowerPivot. It lets people who cannot do VLOOKUPs mash up data and do reports that have never been imagined before. However, every Microsoft blog is busy showing the same layout for PowerPivot demos.

Because you’ve probably seen 1,000 of them, let’s talk about how to replicate those layouts shown in the blogs and press:

  1. Insert a new worksheet to hold the workbook.
  2. Create a combo of two or four pivot charts. Choose a location rather than letting them default. Choose a spot on row 5 of the new workbook.
  3. Add as many slicers as possible to the top and left of the chart.
  4. Build the charts.
  5. Make row 1 very tall—perhaps 270 to 300 points tall. Use Insert, Screenshot to add an interesting graphic to row 1.
  6. Add an interesting graphic below the charts to balance the graphic on top of the charts.
  7. Go to File, Options, Advanced, Display Options for This Worksheet. Clear the Gridlines check box. If you want to go all out, scroll up and remove the scrollbars, sheet tabs, and formula bars.
  8. Minimize the Ribbon.
  9. Add a fill color behind the whole worksheet.
  10. While the pivot table is active, click the bounding box around each slicer. Right-click the border. Select Properties. Select Move and Size with cells.
  11. Click away from the pivot table. See the result in Figure 25.44.

    Figure 25.44. This dashboard tracks how many publications have shown this style of dashboard generated by PowerPivot.

    image

If your layout contains an actual pivot table, consider converting the pivot table to formulas. You can then insert extra rows between the pivot table rows, adding color, and so on.

Refreshing PowerPivot Versus Refreshing Pivot Table

Suppose that your underlying data changes. If it is stored in an Excel linked table, you would go to the PowerPivot tab in the Excel Ribbon and click Refresh All. If it is stored in an External Data Source, you would go to the PowerPivot window and select the Refresh button. If the data were pasted to PowerPivot, you can Paste Append new data or do a Paste Replace.

This does not refresh the pivot table! When you return to Excel, you have to remember to go to the PivotTable Tools Options tab and select Refresh.

Getting Your Data into PowerPivot with SQL Server

Data coming from SQL Server already has a lot of relationships defined. Find the main Fact table, select that table, and then click the button for Select Related Tables. PowerPivot reads the database schema and brings in all the tables with relationships predefined. It is, of course, then possible to add in additional Excel or text data to mash up with the SQL Server data.

Other Issues

Can multiple relationships exist between two tables? No. If you need two relationships, import the lookup table twice and link to each copy separately.

Will PowerPivot ever be available for Excel 2007? No. PowerPivot relies on a number of features added to Excel 2010.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.47.224