PowerPivot for Self-Service BI
After reading the last two chapters you now understand how to create a valid PowerPivot data model. This data set will allow you to slice and dice your data in many different ways and can be used as-is with many tools such as Excel pivot tables and, of course, Power View.
Indeed, there is little to prevent you now from starting to use the data model that you have created in pivot tables or, better still, to deliver stunning Power View visualizations. However, while we are on the subject of finalizing the data model, there are a handful of tweaks that you may want to apply to the tables in the data set. These modifications are not always necessary, and in many cases might not be required at all. Yet there could be an equal number of times when you will need to spend a couple of minutes preparing the data so that any output (and specifically Power View reports) can deliver what you expect immediately and flawlessly.
The kind of tweaks that we are looking at include these:
None of the techniques that you will see in this chapter are particularly complicated. Few of them take any time at all to apply. Yet, used effectively, they can enhance the output that you present to your audience.
If you want to follow the examples in this chapter, use the sample file PowerPivotSampleDataset.xlsx. This file is available on the Apress web site and contains the data set prepared according to the instructions given in the previous two chapters. You can install this file as described in Appendix A.
Some data tables will contain hundreds of fields (or columns, if you prefer); others will contain only a few. Many will contain a subset of columns that you will want to use frequently in Power View visualizations. PowerPivot allows you to define a default field set for each table that “remembers” the frequently used columns so that you can use them all at once in a table or chart, without having to add them individually and potentially laboriously. This subset is called the default field set, and it will also be the list of columns that is returned by the Power BI Q&A (natural language query) functionality, which you will learn more about in Chapter 15.
What is more, a default field set is easy to apply in Power View, because all you have to do is click on the table in the Power View Field List. Power View will immediately create a table using only the columns in the default field set, in the order in which you defined them when you created the field set. Once this is done, nothing is stopping you from removing any unwanted columns from the table; this is often faster than adding columns individually. Here, then, is how you create a default field set.
You will not see any immediate change in Power View. However, the next time that you click on the SalesData table in the Power View Field List, a table containing the fields Make, Marque, CarAgeBucket, and SalePrice will be created, in this order.
Note You cannot create a table in Power View using the default field set by dragging the table name on to the Power View canvas.
If you prefer, rather than double-clicking in step (4) above, you can select a field name in the left-hand pane and click the Add button in the Default Field Set dialog. In addition, to remove a selected field, all you have to do is click on a field name in the right-hand pane and click the Remove button in the Default Field Set dialog. If you want to change the order of the fields that make up the Default Field Set, then all you have to do is click on the field name in the right-hand pane and click the Move Up or Move Down buttons. An existing default field set can be modified at any time by following the steps given earlier.
Sometimes the data in a table may be accurate and complete, but it will not display exactly as you hoped it would. For example, you may want to sort on a column but find that it needs custom sort criteria so that another column actually provides the sort order. Alternatively, you may want to prevent grouping on certain columns—so that PowerPivot does not presume that all your clients named John Smith are the same person, for instance. These and other options are possible and easy to apply so that your final output projects the effect you require in a clear and transparent way. Together these options are known as modifying the table behavior and are applied where necessary to individual tables.
An initial aspect of table behavior that you could find yourself wanting to apply frequently is defining a sort column. Take, for instance, the CarAgeBucket column in the SalesData table in the sample PowerPivotSampleDataset.xlsx data file. If you sort this column directly, the unique values <5 and >30 will appear at the top of the sorted data. Clearly this is not what you want to see, because a viewer would count on the “greater than the fixed upper limit” range to be at the bottom of the ascending sort, not at the top. This kind of logical anomaly can be solved by using another column to provide the sort order. Fortunately this was foreseen in the previous chapter when we created not only the CarAgeBucket column but also the CarAgeBucketSort column. So here is how you apply the CarAgeBucketSort column to sort the CarAgeBucket column correctly.
Figure 11-1. Applying a Sort By column to another column
Now if you sort the data using the CarAgeBucket column as the sort key, the data will actually be sorted using the data in the CarAgeBucketSort column.
Note Columns such as CarAgeBucketSort and MileageRangeSort are ideal candidates to be hidden from the user, as they are not really directly useful so they can be considered more a part of the data set infrastructure. Hiding columns is explained later in this chapter.
If you are preparing the PowerPivot data set for use in the Power View chapters of this book, you will need to add Sort By Columns for the fields in Table 11-1.
Table 11-1. Columns in the Data Set Requiring Sort By Columns
Table |
Column |
Sort By Column |
---|---|---|
SalesData |
CarAgeBucket |
CarAgeBucketSort |
SalesData |
MileageRange |
MileageRangeSort |
Date Table |
QuarterAndYearAbbr |
YearAndQuarterNum |
Date Table |
QuarterAndYear |
YearAndQuarterNum |
Date Table |
MonthAndYearAbbr |
YearAndMonthNum |
The remaining aspects of table behavior that you can change to enhance output all require that the table contain a unique identifier for each row of data. If your data comes from a relational database, then it probably already has what is known as a unique primary key. This is the case for the Countries and Clients tables, so these are the ones that we will use to show the remaining table behavior options.
Before applying any other table properties, we need to tell PowerPivot which column contains the unique key. This is called setting the row identifier.
Figure 11-2. Setting a row identifier for a table in PowerPivot
Once you have successfully set the row identifier, you can proceed to set other table behavior options, as you will see in the next few paragraphs.
Note If the column that you chose as the row identifier does not contain unique data, then you will get an error message and will not be able to set table behavior. This is something that will have to be dealt with in the source data itself.
One table behavior option that can be extremely useful is the ability to prevent PowerPivot (and consequently Power View) from aggregating data for a column. As an example, imagine that you have two clients with the same name. They could be subsidiaries in different states or countries. Anyway, you do not want Power View to display a single total when you analyze data by client. If you tell PowerPivot to keep unique rows for the column that contains the client names, then you can achieve this aim.
Now when you use the ClientName field in Power View, each client will appear separately, even if more than one client has the same name.
Power View treats all text fields as labels. Such egalitarianism may be laudable in many circumstances, but it can diminish certain visual effects when you are using tiles, for instance. So one option is to set a default label field. Put simply, any default label field will be given prominence in certain Power View visualizations.
Let’s take Countries as an example here, because it contains two fields that might be used in visualizations. However, let us presume that we need the CountryName field (the renamed CountryName_EN field) to stand out when visualizations are created. Here is how you can set a field to be a default label.
Now, when you create cards in Power View, you get the sort of difference that you can see in Figure 11-3.
Figure 11-3. Cards in Power View when the default field label is used
Set a Default Aggregation (Summarize By)
In PowerPivot itself, and in client tools such as Power View, you will inevitably be aggregating numeric data. Normally, this means summing up the data in a table or matrix. In some cases, however, you may find that you want a different aggregation to be applied when you are analyzing your data. Now, you can certainly override any default aggregations when creating Power View visualizations, as we saw in Chapter 2. But it can get very wearing to override the default time and time again in dozens of visualizations in possibly hundreds of reports. So to avoid this waste of energy, define the default aggregation for any field in a PowerPivot data set.
Figure 11-4. Defining a default aggregation
From now on, the selected aggregation will be applied whenever you use this metric in a Power View table, matrix, or chart. This does not mean that you are stuck forever with the default aggregation that you set, just that it will be used by default until you apply another in a specific visualization.
Note Setting a default aggregation can just well mean indicating that no aggregation must be applied. This is essentially required when a column contains a numeric value that has no meaning if it is summarized. One example is a column containing years; another is a column of IDs. In these cases, you should choose Do Not Summarize as the aggregation.
Preparing Images for Power View
PowerPivot will not display images, but it can reference or contain images so that Power View can use them to impress your audience. You can make PowerPivot return images to Power View in two ways. Both require a column of data containing either of the following:
Both require a small amount of preparation. Indeed, the two methods are very slightly different depending on how the image is stored. Yet the result is well worth the small amount of effort required to deliver some impressive results, as I am sure you will agree.
Before walking through the techniques for specifying to PowerPivot that fields contain either images or references to images, I think that it is best to explain what the source data must be.
In any case, these are requirements that must be dealt with at the level of the data. Remember that you cannot alter the data itself in PowerPivot.
I will once again take the Countries table as an example here, as it contains binary images that were previously loaded into the database table—and that consequently have been imported into PowerPivot. To tell PowerPivot that a field actually contains an image:
That is all you have to do. From now on, Power View will recognize this field as an image and display the image in visualizations.
In some cases you may have data that does not contain the image but refers to it, either in a network share or on the web. In these cases your source data will (must) have a column that contains the complete path to the image file, including the file name. However, you will need to tell PowerPivot that the text that is imported is not just a label but contains the path to an image. Here is how this can be done for an image on disk:
As you know, Power View treats all labels as equal; it does the same with images. However, you can choose to decide that some images are more equal than others, and consequently, you give them preeminence when they are displayed. To do this:
PowerPivot now knows that this image is the default image it should use in output and visualizations.
Preparing Hyperlinks for Power View
Another visualization technique that can impress your audience is to include hyperlinks in Power View reports. Once again, all that this requires is a little preparation.
1. Click on the tab for the table for which you want to define the column containing the hyperlink (Clients). If you are in diagram view, click on the table.
2. Click on the column that you know contains the binary data for the image (ClientWebSite in this example).
3. Switch to the Advanced tab (unless it is already active).
4. In the Datacategory popup at the right of the ribbon, select Web URL.
If you use this field in Power View, it will not just display the URL—it becomes a hyperlink.
Note You could find that PowerPivot actually recognizes most URLs for what they are and sets them as Web URL (suggested) in the Datacategory popup.
Hierarchies can be an extremely powerful complement to your data set. Although they are rarely strictly necessary, they can make your data set both easier to understand and easier to use.
A hierarchy is a set of columns in a table that guide the user through a predefined path from the highest level to the lowest level. As this is probably best understood with an example, consider how you might describe cars when you are analyzing sales. You probably want to start with the make of car, and then the model. Let's create a hierarchy based on these two elements.
Figure 11-5. A hierarchy in PowerPivot
Using a hierarchy (in Power View, for instance) is easy. All you have to do is drag the hierarchy onto the Power View canvas; when you do a table containing all the fields that you defined in the hierarchy will be created in the order in which you created them. So a single click can convert this table into a matrix with a predefined progression through the data, or into a drill-through chart. If you want to use only one of the fields from a hierarchy, then (in Power View again) you just expand the hierarchy and drag the field that you want onto the Power View canvas.
Tip Once you are used to creating hierarchies in PowerPivot you might like to accelerate the process slightly using the following technique. Start by right-clicking on the first field that you want to appear in a new hierarchy before you select Create Hierarchy. A new hierarchy will then be created using this field. You can then rename the hierarchy and add other fields as described previously.
You can create a hierarchy of many levels in PowerPivot. However, as this technique is primarily to help and guide the user, I would advise that you not overdo it. You can also rename any level in a hierarchy just as you rename a field. PowerPivot will always place the original field name in brackets after the hierarchy name so that you can see which field was the source.
Hierarchies are as easy to modify as they are to create. You can add or remove levels, change the order of levels in a hierarchy, or remove the hierarchy entirely.
To add a level to a hierarchy all you have to do is
When you drag a field onto an existing hierarchy, the cursor becomes a thick black line that indicates where the added level will be placed. This is shown in Figure 11-6.
Figure 11-6. Adding a level to a hierarchy in PowerPivot
Removing a Level from a Hierarchy
To remove a level from a hierarchy all you have to do is
Note Removing a level from a hierarchy has no effect on the data field, which remains in the field list.
Altering the Levels in a Hierarchy
You do not have to re-create an entire hierarchy to modify the levels that you previously created. Suppose that you wish to change the order of the elements in a hierarchy.
If you prefer to use the mouse, then you can simply drag a level in a hierarchy up and down to reorder the hierarchy. The cursor will become a thick black line to indicate where the level will be moved to.
To delete a hierarchy
You can, of course, use the Delete key to remove a previously selected level in a hierarchy, provided that you confirm your action.
Note Deleting a hierarchy has no effect on the data fields on which it was based, which remain in the Field List.
If you are using a field in a hierarchy, you probably do not want to see the field name twice in Power View—once as a field and once again in the hierarchy. PowerPivot has a quick trick to help out here:
The field will remain in the hierarchy and in the table, but it will no longer appear in Power View as a separate field.
Hiding Hierarchies from the Diagram View
You may not always want your hierarchies to be visible in the diagram view. In this case you can choose to hide—or display—hierarchies like this:
All hierarchies will be hidden in all tables in the data model. To make the hierarchies reappear, simply ensure that the Hierarchies check box is ticked.
Key Performance Indicators (KPIs)
PowerPivot is designed to handle very large amounts of data. Power View is built to allow you to view the salient points in the data. Nonetheless, it can get hard to track and remember many key numbers in a large data set where you have tens, if not hundreds, of important figures to follow.
This is where Key Performance Indicators (KPIs) step in. KPIs, are visual indicators of essential metrics in your data. You set the KPI to indicate whether your sales are on target, for instance. Or you can set a KPI (as we will do now) to keep an eye on gross margins. Then you will display the KPI as a very visual alert in Power View, as you saw in Chapter 2, Figure 2-30.
So what we will do here is create a KPI using the AvgGrossMargin calculated field from the sample data. We will then tell PowerPivot that we are aiming for a gross margin of £15,000.00, and that we will accept a range of £11,000.00 to £16,000.00 as acceptable. Anything below £11,000.00 is unacceptable and will trigger a visual warning; anything over £16,000.00 will be flagged as good news.
Figure 11-7. The Key Performance Indicator (KPI) dialog
The KPI has been created, and a small icon with three colors appears to the right of the calculated field that you used as the basis for the KPI. You can now use the Status, Value, and Goal fields in Power View table and matrix visualizations.
Note KPIs require that the input data be from a calculated field. You cannot just use a column of data for this.
When building this first KPI we bypassed some variations on a theme that you may find interesting. It follows that there are a couple of techniques that you could be tempted to apply when developing your own KPIs.
First, there is the question of the number of status thresholds, and consequently, the number of status icons that you can use. The choice is fairly simple:
Once you have decided whether to use three or five icons, you can choose how the base field relates to the target:
You make these four choices by clicking on one of the four status threshold icons in the Key Performance Indicator (KPI) dialog. Figure 11-8 shows the status threshold pane of the Key Performance Indicator (KPI) dialog for a five icon choice. As you can see, you can then adjust the four thresholds that define the five ranges that will be represented by the appropriate icon.
Figure 11-8. The Key Performance Indicator (KPI) options
The icon selections are described in Table 11-2.
Table 11-2. KPI Options
Indicator |
Type |
Comments |
---|---|---|
Circles |
Three |
Simple colored circles |
Flags |
Three |
Colored flags |
Indicators in a circle |
Three |
Symbols in a colored circle |
Traffic Light |
Three |
Uses a traffic light style |
Symbols |
Three |
Colored symbols |
Shapes |
Three |
Colored shapes |
Stars |
Three |
Stars of a single color that fill up to indicate the status |
Bars |
Five |
Progressively increasing bars of a single color |
Squares |
Five |
Progressively filled-in squares of a single color |
Pies |
Five |
Progressively filled-in pies of a single color |
Creating KPIs is easy, as you have just seen. Remembering the details later can be harder, however. Because of this, PowerPivot lets you add comments to each KPI you create. You can add comments while you are creating a KPI or at a later date, of course. Here, however, I will show you how to add comments to an existing KPI.
Figure 11-9. The Key Performance Indicator (KPI) dialog Descriptions pane
To return to the main (status) pane of this dialog, all you have to do is click on KPI Status at the top of the dialog.
Note The KPI description will be used as a tooltip in Power View.
On some occasions you could have a variable rather than a fixed target for a KPI that you are creating. Creating one of these is a virtually identical process to the one that you followed previously in the “Creating a KPI” section, but here are a few minor differences:
Once you have done completed these steps, the KPI dialog should look like Figure 11-10, where I have also changed the icon style.
Figure 11-10. The Key Performance Indicator (KPI) options for calculated KPI targets
Of course, you may need to tweak a KPI once you have created it. This is why it is so important to be aware of the KPI icon that can appear to the right of calculated fields. This icon is shown in Figure 11-11.
Figure 11-11. The Key Performance Indicator (KPI) icon for a calculated field
To modify a KPI, all you have to do is
IF a KPI has become redundant, you can delete it:
Note The KPI will be deleted without any confirmation or warning. Deleting a KPI will not affect the calculated field(s) it was based on.
It is easy, with a little practice, to develop quite complex PowerPivot data models that contain dozens of tables and hundreds of columns. This is fabulous for defining a “single version of the truth”—the Eldorado of business intelligence. The downside is that the sheer complexity and breadth of a large model can become difficult for end users to navigate.
So the PowerPivot team came up with a solution. The answer is to create and use perspectives. A perspective is a subset of the tables and columns that are particularly relevant to a group of users. Once a set of perspectives has been created, users can switch from one to another and thus only see the data that is relevant to a specific type of analysis.
Here is how you can create a perspective:
The Perspectives dialog will look like Figure 11-12.
Figure 11-12. The Perspectives dialog
Nothing has happened when you created a perspective—yet. So now it is time to use the perspective that you just created.
Figure 11-13. Applying a perspective
Any tables that were not selected as part of the perspective will temporarily be hidden, and any columns in the visible tables that you did not select will also be hidden from view. So you, or your users, can perform any required analyses on a simpler data model. Of course, you can return to the initial, default perspective at any time by selecting it from the list of available perspectives.
Note Power View can currently only use the default perspective.
A few pages ago we indicated to PowerPivot that certain fields were to be treated as images. Well, these are not the only hints that you can give to PowerPivot concerning the data a field contains. You can choose to apply to your data to several data categories, and this can only help Power View or Power Map (more on that in Chapter 13) display the sort of results that you really want.
Because we have already seen how to apply data categories in the context of images, I will not repeat the process, but will explain, in Table 11-3, what the remaining data categories are and when you should use them.
Table 11-3. Data Category Options
Data Category |
Type |
Comments |
---|---|---|
Address |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
City |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
Company |
Organization |
A custom category |
Continent |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
Country/Region |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
County |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
Date |
Date |
Tells Power View that the column contents are a date |
Image |
Image Reference |
Tells Power View that the column contents are to be displayed as an image |
Image URL |
Image Reference |
Tells Power View that the column contents are to be displayed as an image |
Latitude |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
Longitude |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
Organization |
Organization |
A custom category |
Place |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
Postal Code |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
Product |
Product |
A custom category |
State or Province |
Geography |
Indicates to Power View maps and Power Map that this field can be used by Bing maps |
Web URL |
Hyperlink |
Tells Power View that data in this field is a hyperlink and that it can be clicked to open the URL |
Hiding Columns from Client Tools
If you have spent a few minutes preparing your data set so that it is ready to be used in Power View (or many other visualization tools), then you could be looking at many, many columns of data. Some of these columns might be necessary for the data visualization to work effectively, but you and your users might not need to see them in Power View. Such columns have become part of the infrastructure of the data but are not required in the tables and charts that you will be creating.
So, if you want to remove extraneous clutter from your data set and hide columns that (and this list is not exhaustive)
The selected column will be grayed out in the table, and it remains in the data set. It will not, however, be visible in Power View or the PowerPivot table Field List.
Tip You can Ctrl-click to select multiple fields, even from several tables, before right-clicking and choosing to hide the fields from client tools.
If you are using this chapter to prepare the PowerPivot data model that is used as the basis for all the Power View examples in the previous chapters, then you will need to hide a certain number of columns from Power View. These columns are given in Table 11-4. Be aware, though, that you will need to create some of these columns in the data model before you can hide them from the user. This is explained in the previous chapter.
Table 11-4. Columns to Hide to Prepare the Data Model
Table |
Column |
---|---|
SalesData |
CarAge |
SalesData |
CarAgeBucketSort |
SalesData |
MileageRangeSort |
SalesData |
RegistrationDate |
Colours |
ColourID |
Countries |
CountryID |
Clients |
ClientID |
Clients |
Address1 |
Clients |
Address2 |
Clients |
CountryID |
Clients |
IsCreditWorthy |
Clients |
Website |
Preparing Data for Natural Language Querying
When you delve into Chapter 15 you will see, amongst other things, how you can query the data in a PowerPivot data model using natural language querying – that is without needing to learn a computer language to return data. However as humans (and not computers) we tend to use many words to describe the same thing, or even to use idiosyncratic vocabulary when defining data models.
PowerPivot lets you add synonyms for the names of objects (that is essentially columns and tables) to a data model. This helps the natural language querying engine interpret requests with greater accuracy. If you want to extend the data model with synonyms
Some synonyms are in bold. These are autogenerated by PowerPivot, and cannot be modified.
Note Synonyms are only available in Office 365 Professional Plus or the Click-to-Run version of Office 2013 Professional Plus. Also you must be signed in to Office 365 with an organizational account.
Although it is not something which springs to mind when initially creating a PowerPivot data model, file size can be important. It is altogether too easy to create huge data models in PowerPivot, if only because the existing Excel worksheet limits are no longer a constraint. In practice, though, you are probably better off trying to create the most compact data model that you can.
There are several reasons for this:
To end this chapter, here are a few techniques and ideas to help you create lean and efficient data models in Power Pivot. This list is not exhaustive, by any means, but it should point you in the right direction.
On a more general note, you can also reduce the size of Excel worksheets by removing
File size optimization is a never-ending cycle of trial and error. However, if you are aware of the basic techniques that can be used, then you should be able to build more efficient data models from the ground up.
Conclusion
So that is all there is to preparing a data set for top of the range Power View reports. As long as you have set any relevant table behavior options to define default field sets and columns containing images, for instance and applied Sort By columns and data categories, then your data should appear correctly and with greater panache. To finish, add a few Key Performance Indicators, ensure that any columns that are not really required by users are hidden, and make sure all useful hierarchies are set up; if you do all this, your audience will be really impressed. If as well you have prepared the data for natural language querying and have structured it so that there is no data redundancy, then you have really done a good job.
18.116.63.191