©  Adam Aspin 2018
Adam AspinPro Power BI Desktophttps://doi.org/10.1007/978-1-4842-3210-1_10

10. Creating a Data Model

Adam Aspin
(1)
Stoke-on-Trent, Staffordshire, UK
 
You need only one thing to create stunning visualizations and that is good data. Specifically, you need clean and accurate data loaded into the Power BI Desktop data model. There you can hone tens of millions of rows from multiple data sources into a coherent and powerful framework on which you can build your analyses and dashboards.
Finding, editing, and loading the source data using queries (as you learned in Chapters 2 through 9) is fundamental to preparing your data for the data model. However, the process does not stop there. Once you have made accessible the data that you need, you still have a few more tasks to carry out. To complete the process, you need to assemble these queries into a coherent structure that is clear and comprehensible to anyone who uses it.
To guarantee that you are at ease when creating a data model in Power BI Desktop for later analysis in your dashboards, this chapter introduces you to some of the techniques that you need to apply to model your data. You will discover how to take the data tables that you loaded and convert them into a structured dataset in the Power BI Desktop data model. This framework enables you to deliver information, insight, and analysis from the data in the tables. This learning curve covers
  • Manipulating tables
  • Specifying data types
  • Formatting data in the data model
  • Establishing relationships between the tables
  • Defining hierarchies
  • Categorizing data
  • Adding “sort by” columns that ensure the correct sort order in dashboard elements
  • Creating groups based on column values
The fourth point in this list is probably the most fundamental aspect of data modeling. To report on data accurately and precisely, you must allow Power BI Desktop to understand how the data in one table is linked to the data contained in another table. Chaining one table to another will let you use the data to deliver accurate and cogent results.
Once again, all the sample files used in this chapter are available on the Apress web site. Once downloaded, they should be in the C:PowerBiDesktopSamplesCH10 folder.

Data Modeling in the Power BI Desktop Environment

Before leaping into the detail of what you can do to create and enhance a data model, I think that it is best if you first familiarize yourself with the tool itself so that you can feel at home in this new environment.

The Power BI Desktop Data View

To start using Power BI Desktop for data modeling , you need to follow these steps:
  1. 1.
    Run Power BI Desktop and load any queries that you need to access your data. To follow the examples in this chapter, load the C:PowerBiDesktopSamplesCH10CarSalesDataFromDataModel.pbix file.
     
  2. 2.
    If you are using your own data, make sure that you have closed the Power BI Desktop Query window, applying any changes that you have made, and reverted to the Power BI Desktop environment.
     
  3. 3.
    Click the Data View icon on the left. You will switch to the Data View, where you can see all the available tables in the Fields list on the right. One of the tables will be selected and its data will be visible in the Data pane, as shown in Figure 10-1.
     
A370912_2_En_10_Fig1_HTML.jpg
Figure 10-1.
Data View
Note
Modeling data is only possible once data has been loaded into the Power BI Desktop data model. By this I mean that editing data in the Query Editor and returning to the Power BI Desktop window without applying your changes will hinder your data modeling—or at the very least will prevent you from returning accurate results.

Data Model or Query?

You could well be thinking that Data View does not look like very much at all yet. If anything, it looks like an extension of the Power BI Desktop Query window. However, be reassured, you will soon see what you can do in Data View, and exactly how powerful a tool Power BI Desktop really is when it comes to data modeling. For the moment, it is essential to remember that you have (so to speak) opened a door into the engine room of Power BI Desktop. Although this new world is part of Power BI Desktop (and you can return to Power BI visualizations instantaneously just by clicking the Report View icon above the Data View icon), it is best if you consider it a kind of parallel universe for the moment. This universe has its own ribbons and buttons and is separate from the dashboard View so that you can concentrate on enhancing the data and data structure without getting distracted.
The Power BI Desktop Data View is also different from the Power BI Desktop Query Editor. For instance, one major difference between the Power BI Desktop Query window and the Data View is that in Data View you are looking at all your data. The Query window only ever shows a sample of data. Once you load the data into the data model, you finally have access to the entire dataset (all the data in all the tables), and this is what you can see in the Data View.
A second difference is that once you have left the Query window, you are always working with the entire dataset and only filtering it as required by specific dashboards or visualizations. So you need to be aware that any filters that you apply in the Query window limits the data that can be displayed in Power BI Desktop. By contrast (and as you will see later in this book), you can apply subsequent filters to entire dashboards or specific visualizations once you have defined your core data and created the data model.
There are some areas where Power BI Desktop Query can do some of the things that can also be done in the Data View or Relationships View. For instance, you can create calculated columns in both (you will see how to do this in Data View in the following chapter). My advice is to try to remember that Power BI Desktop Query is for finding, filtering, and mashing up data, whereas Data View is for refining and calculating the metrics in your data model. However, each user can take the approach that they prefer and carry out any necessary calculations in either the Query Editor or the Data View. After all, it is the result that counts.
Note
Always make sure that you have clicked Close & Apply when using the Query Editor before modeling the data. This guarantees that you are working on the most up-to-date version of the data transformations that you have made, as well as the most recent data.

The Power BI Desktop Data View Ribbons

So what, exactly, are you looking at when you start Power BI Desktop? Essentially, you can see two ribbons , which are all devoted to data management:
  • The Home ribbon
  • The Modeling ribbon
Since the Home ribbon is principally used when creating reports and dashboards, I will explain it in Chapter 14. I prefer to explain the Power BI Desktop Modeling ribbon as we start out in this chapter, because you will use it extensively in the next few pages. This should make understanding the Power BI Desktop environment easier. However, if you prefer to skip this section (or possibly use it as reference later), then feel free to jump ahead.

The Modeling Ribbon

The Modeling ribbon is used to categorize and organize data and tables as well as to add calculated columns and additional metrics. The buttons that it contains are shown in Figure 10-2 and explained in Table 10-1.
A370912_2_En_10_Fig2_HTML.jpg
Figure 10-2.
Buttons in the Modeling ribbon
Table 10-1.
The Modeling Ribbon Buttons
Button
Description
Manage Relationships
Lets you join tables as well as delete these joins (called relationships) and modify them.
New Measure
Used to add a new value or calculation to a table.
New Column
Adds a new calculated column to a table.
New Table
Adds a new table
Sort By Column
Specifies that the data in one column be used as the basis for sorting data in another column.
Data Type
Lets you define the data type for a column.
Format
Lets you specify how numbers are formatted.
Home Table
Lets you select the table that will contain a measure.
Data Category
Allows you to define that a certain column is of a specific category. It is mostly used with geographical data for mapping.
Default Summarization
Lets you define the default aggregation that is applied in dashboard visualizations .
Manage Roles
Allows you to restrict data access for given users.
View As Roles
Lets you see the data available for a given role or roles.
New Group
Allows you to create groups from the values in a column.
Edit Groups
Allows you to modify groups from the values in a column .

Managing Power BI Desktop Data

Assuming that all has gone well, you now have a series of tables from various sources successfully added to your Power BI Desktop data model. If you have opened the file C:PowerBiDesktopSamplesCH10CarSalesDataFromDataModel.pbix you can see a set of data tables in the Fields list at the right of the Power BI Desktop window. Clicking a table will display the data from that table in the central area of the Data View window. It will soon be time to see what you can do with this data, but first, to complete the roundup of overall data management, you need to know how to do the following:
  • Rename tables
  • Delete tables
  • Move tables
  • Move around a table
  • Rename a column
  • Delete columns
  • Set column width
Note
When loading and editing source data, I generally use the term dataset to describe the source data. Now that the data is finally loaded, I will use the term (data) table to describe the data entities that Power BI Desktop uses when shaping and extending the data model. This is because a table is, virtually always, the output resulting from all transformations produced by a query.

Manipulating Tables

Let’s begin by seeing how you can tweak the tables that you have successfully imported and transformed.

Renaming Tables

Suppose that you wish to rename a table that you previously imported using Power BI Desktop Query . These are the steps to follow:
  1. 1.
    Right-click the table name in the Fields list at the right of the Power BI Desktop window.
     
  2. 2.
    Select Rename. The current name will be highlighted in the Fields list.
     
  3. 3.
    Enter the new name or modify the existing name.
     
  4. 4.
    Press Enter.
     
This also renames the query on which the table is based. In essence, Power BI will let you rename datasets either as queries in the Power BI Desktop Query window or in the Data window. Because the query is the table, renaming one renames the other.

Deleting a Table

Deleting a table is virtually identical to the process of renaming one—you just right-click the table name tab and select Delete instead of Rename. As this is a potentially far-reaching operation, Power BI Desktop will demand confirmation.
Note
When you delete a table, you are removing it from Power BI Desktop completely. This means that it is also removed from the set of queries that you may have used to transform the data. So you need to be careful when deleting tables, because you could lose all your carefully wrought transformation steps as well.

Selecting a Column from the List of Available Column Names

If you want to leap straight to a column of data (and presuming you know which column contains the data that interests you), all you have to do is
  1. 1.
    In the Fields list, click the expand triangle to the left of the table containing the field that you want to jump to. A list of the fields in the table appears, as shown in Figure 10-3.
    A370912_2_En_10_Fig3_HTML.jpg
    Figure 10-3.
    The list of columns in a table
     
  2. 2.
    Click the name of the field that contains the data that you want to study. The field will be selected in the table.
     

Manipulating Columns

Now let’s see how to perform similar actions—but this time inside a table—to the columns of data that make up the table.

Renaming a Column

Renaming a column is pretty straightforward. All you have to do is
  1. 1.
    In the Fields list, right-click the title of the column that you wish to rename. In this example, it is the field ClientType in the Data Clients table. The column will be selected and the context menu will appear. This is shown in Figure 10-4.
    A370912_2_En_10_Fig4_HTML.jpg
    Figure 10-4.
    The Power BI Desktop context menu
     
  2. 2.
    Select Rename from the context menu. The current column name will be highlighted.
     
  3. 3.
    Type in the new name for the column (TypeOfClient) and press Enter.
     
And that is it. Your column has been renamed; it is also selected. You cannot use the name of an existing column in the same table, however. If you try, Power BI Desktop will display the dialog that you can see in Figure 10-5.
A370912_2_En_10_Fig5_HTML.jpg
Figure 10-5.
The Rename column dialog
If this happens, just click Close and start over using a different new column name .
Although renaming a column may seem trivial, it can be important. Consider other users first; they need columns to have instantly understandable names that mean something to them. Then there is the PowerBI.com Q&A natural language feature. This only works well if your columns have the sort of names that are used in the queries—or ones that are recognizable synonyms. Finally, you cannot rename columns individually in separate visualizations, so you really need to give your columns the names that you are happy seeing standardized across all the dashboards that are based on this dataset. The upside to this is that the same data will always have the same name, thus removing potential incomprehension or confusion on the part of the audience-and yourself.
You can also rename a column by right-clicking the column title that interests you and selecting Rename from the context menu .
Tip
Power BI Desktop is very forgiving when it comes to renaming columns (or, indeed, tables and calculations too). You can rename most elements in the Query window, the Data View, the Report View, or the Relationships View (depending where they were added) and the changes will ripple through the entire Power BI Desktop data model. Better still, renaming columns, calculations, and tables generally does not cause Power BI Desktop any difficulties if these elements have already been applied to dashboards, as these are also updated to reflect the change of name. Just be aware that if you customize a lot of names and don’t document the columns for reference, you will find that reverse engineering a report can be quite a task. In practice, it can save you a lot of effort if you always keep a reference of alias columns back to source in the query.

Deleting Columns

Deleting a column is equally easy. You will probably find yourself doing this when you bring in a column that you did not mean to import or when you find that you no longer need a column. So, to delete a column, you need to do the following:
  1. 1.
    Right-click the title of the column that you wish to delete. Alternatively, right-click the field name in the Fields list. The column will be selected and the context menu will appear.
     
  2. 2.
    Select Delete Columns from the context menu. Unless the column is empty, the confirmation dialog will appear as shown in Figure 10-6.
    A370912_2_En_10_Fig6_HTML.jpg
    Figure 10-6.
    Deleting a column
     
  3. 3.
    Click OK. The column will be deleted from the table.
     
Deleting unused columns is good practice because you will
  • Reduce the memory required for the dataset.
  • Speed up data refresh operations .
  • Reduce the size of the Power BI Desktop file.
Note
Deleting a column really is permanent. You cannot use the undo function to recover it. Indeed, refreshing the data will not add the column back into the table either. If you have deleted a column by accident, you can choose to close the Power BI Desktop file without saving and reopen it, thus reverting to the previous version. Otherwise, you can return to the source query and add the column name back into the query.

Moving Columns

Once in the data model , you cannot move columns around. So if you want to change the column order for any reason, you have to switch to the Power BI Desktop Query Editor and move the column there. Once you save and apply your changes, the modified column order is visible in the data model. However, columns will always appear in alphabetical order in the Fields list in Report View.

Setting Column Widths

One final thing that you may want to do to make your data more readable—and consequently easier to understand—is to adjust the column width. I realize that as an Excel or Word user , you may find this old hat, but in the interests of completeness, this is how you do it:
  1. 1.
    Place the mouse pointer over the right-hand limit of the column title in the column whose width you want to alter. The cursor will become a two-headed arrow.
     
  2. 2.
    Drag the cursor left or right.
     
Note
You cannot select several adjacent columns before widening (or narrowing) one of them to set them all to the width of the column that you are adjusting. You can double-click the right-hand limit of the column title in the column whose width you want to alter so that Power BI Desktop can set the width to that of the longest element in the column.

Power BI Desktop Data Types

When you are importing data from an external source, Power BI Desktop tries to convert it to one of the nine data types that it uses. These data types are described in Table 10-2.
Table 10-2.
Power BI Desktop Data Types
Data Type
Description
Decimal Number
Stores the data as a real number with a maximum of 15 significant decimal digits. Negative values range from –1.79E +308 to –2.23E –308. Positive values range from 2.23E –308 to 1.79E + 308.
Fixed Decimal Number
Stores the data as a number with a specified number of decimals.
Whole Number
Stores the data as integers that can be positive or negative but are whole numbers between 9,223,372,036,854,775,808 (–2^63) and 9,223,372,036,854,775,807 (2^63–1).
Date/Time
Stores the data as a date and time in the format of the host computer. Only dates on or after the 1st of January 1900 are valid.
Date
Stores the data as a date in the format of the host computer.
Time
Stores the data as a time element in the format of the host computer.
Text
Stores the data as a Unicode string of 536,870,912 bytes at most.
True/False
Stores the data as Boolean—true or false.
Binary
Stores the data as binary (machine-readable) data.

Formatting Power BI Desktop Data

Power BI Desktop allows you to apply formatting to the data in the tables that it contains. When you format the data in the Power BI Desktop data model, you are defining the format that will be used in all visualizations in all the dashboards that you create using this metric. So it is probably worth learning to format data for the following reasons:
  • You will save time and multiple repetitive operations when creating reports and presentations by defining a format once and for all in Data View. The data will then appear using the format that you applied in multiple visualizations in this Power BI Desktop file.
  • It can help you understand your data more intuitively if you can see the figures in a format that has intrinsic meaning.
This explains how to format a column (of figures in this example):
  1. 1.
    Assuming that you are working in the Power BI Desktop Data View, click the table name in the Fields list that contains the metric you wish to format.
     
  2. 2.
    Click inside the column that you want to format (SalePrice from the InvoiceLines table in the CarSalesDataFromDataModel.pbix sample file).
     
  3. 3.
    In the Modeling ribbon, click the Thousands Separator icon (the comma in the Formatting section). All the figures in the column will be formatted with a thousands separator and two decimals.
     
The various formatting options available are described in Table 10-3.
Table 10-3.
Currency Format Options
Format Option
Icon
Description
Example
General
 
Leaves the data unformatted.
100000.01
Currency
A370912_2_En_10_Figa_HTML.jpg
Adds a thousands separator and two decimals as well as the current monetary symbol.
'100,000.01
Date/Time
 
Formats a date and/or time value in one of a selection of date and time formats.
 
Decimal Number
A370912_2_En_10_Figb_HTML.jpg
Adds a thousands separator and two decimals.
100,000.01
Whole Number
 
Adds a thousands separator and truncates any decimals.
100,000
Percentage
A370912_2_En_10_Figc_HTML.jpg
Multiplies by 100, adds two decimals, and prefixes with the percentage symbol.
28.78%
Scientific
 
Displays the numbers in scientific format.
1.00E+05
Text
 
Formats the column contents as text.
 
Binary
 
Leaves the column contents as a binary representation of the data.
 
True/False
 
Formats the column contents as True/False.
 
Decimal Point
A370912_2_En_10_Figd_HTML.jpg
Increases or reduces the number of decimals.
 
If you wish to return to “plain vanilla” data, then you can do this by selecting the General format. Remember that you are not in Excel, and you cannot format only a range of figures—it is the whole column or nothing. Also, there is no way to format nonadjacent columns by Ctrl-clicking to perform a noncontiguous selection. And, you cannot select multiple adjacent columns and format them in a single operation.
By now, you have probably realized that Power BI Desktop operates on a “Format once/apply everywhere” principle. However, this does not mean that you have to prepare the data exhaustively before creating dashboards and reports. You can flip between the data model and the Report View at any time to select another format, secure in the knowledge that the format that you just applied is used throughout all of your dashboards wherever the relevant metric is used.
Note
Numeric formats are not available for selection if the data in a field is of text or data/time data type. Similarly, date and time formats are only available if the column contains data that can be interpreted as dates or times. Only Boolean (True/False) data types can be formatted as True/False. Finally, only columns containing binary data can be formatted as binary.

Currency Formats

Power BI Desktop will propose a wide range of currency formats . To choose the currency that you want:
  1. 1.
    Click the popup (the downward-facing triangle) to the right of the currency format icon. This will display a list of available formats.
     
  2. 2.
    Select the currency symbol that you want, or scroll through the list to view all the available currency formats, as shown in Figure 10-7, then click OK.
     
A370912_2_En_10_Fig7_HTML.jpg
Figure 10-7.
The currency format popup list
Note
The thousands separator that is applied, as well as the decimal separator, depends on the settings of the PC on which the formatting is applied.

Preparing Data for Dashboards

Corralling data into a structure that can power your dashboards necessitates a good few tweaks above and beyond specifying data types and formats for final presentation. As part of the groundwork for your dashboards, you could also have to
  • Categorize data.
  • Apply a default summarization.
  • Define Sort By columns.
  • Hide fields.
These ideas probably seem somewhat abstruse at first sight, so let’s see them in action to make it clear why you need to add these touches to your data model.

Categorizing Data

Power BI dashboards are not just made up of facts and figures. They can also contain geographical data or hyperlinks to web sites or documents. While we humans can recognize a URL pretty easily and we can guess that a column with postcodes contains, well, postcodes, such intuitions may not be quite as self-evident for a computer.
So, if you want Power BI to be able to add maps or hyperlinks (for instance), you will make life easier for both you and the application if you categorize any columns that contain the types of data that are used for maps and links.
For instance, suppose that you want to prepare the Countries table as a potential data source for a dashboard map (and assuming that you have loaded the CarSalesDataFromDataModel.pbix sample file):
  1. 1.
    In the Data View, select the Countries table in the Fields list. The data in the Countries table will appear in the center of the Power BI Desktop window.
     
  2. 2.
    Click inside the CountryName column. The column will be highlighted.
     
  3. 3.
    In the Modeling ribbon, click the popup to the right of the Data Category button. You can see this context menu in Figure 10-8.
    A370912_2_En_10_Fig8_HTML.jpg
    Figure 10-8.
    The Data Category options
     
  4. 4.
    Select Country/Region from the menu.
     
The ribbon will show Data Category: Country/Region. This means that Power BI Desktop now knows to use the contents of this field as a country when generating maps in dashboards.
The Data Category options that are available are described in Table 10-4.
Table 10-4.
Data Category Options
Data Category Option
Description
Uncategorized
Applies to data that is not used for hyperlinks or creating maps.
Address
Specifies an address for mapping.
City
Specifies a city for mapping.
Continent
Specifies a continent for mapping.
Country/Region
Specifies a country or region for mapping.
County
Specifies a county for mapping.
Latitude
Specifies a latitude for mapping.
Longitude
Specifies a longitude for mapping .
Place
Specifies a location or place for mapping.
Postal Code
Specifies a postal (ZIP) code for mapping.
State or Province
Specifies a state or province for mapping.
Web URL
Indicates a URL for a hyperlink.
Image URL
Indicates a URL for the source of an image in a dashboard.
Barcode
Specifies that the field contains a barcode .
Note
Not specifying a data category does not mean that Power BI Desktop cannot create maps in dashboards or recognize URLs. However, the results cannot be guaranteed of a reasonable chance of success unless you have indicated to the application that a column contains a certain type of data.

Applying a Default Summarization

When you are creating dashboards, you are often aggregating numeric data. Most times, this means adding up the figures to return the column total. However, there could be columns of data where you want another aggregation applied. Do the following to set your own default aggregation (assuming that you have loaded the CarSalesDataFromDataModel.pbix sample file):
  1. 1.
    In the Data View, select the Colors table in the Fields list. The Colors data will appear in the center of the Power BI Desktop window.
     
  2. 2.
    Click inside the Color column. The column will be highlighted.
     
  3. 3.
    In the Modeling ribbon, click the popup to the right of the Default Summarization button. The Default Summarization popup menu appears, as shown in Figure 10-9.
    A370912_2_En_10_Fig9_HTML.jpg
    Figure 10-9.
    The Default Summarization popup menu
     
  4. 4.
    Select Count from the context menu.
     
The ribbon will show Default Summarization: Count.
The Default Summarization options that are available are described in Table 10-5.
Table 10-5.
Default Summarization Options
Default Summarization Option
Description
Don’t Summarize
The data in this column is not summarized.
Sum
The data in this column is added (summed).
Average
The average value for the data in this column is returned.
Minimum
The minimum value for the data in this column is returned.
Maximum
The maximum value for the data in this column is returned.
Count
The number of elements in this column is returned.
Count (Distinct)
The number of individual (distinct) elements in this column is returned.
Obviously, you can only apply mathematical aggregations to numeric values. However, you can apply counts to any type of data .
Note
Specifying a default aggregation does not prevent you from overriding the default in dashboards. It merely sets a default that is applied as a standard when aggregating data from a column. In the real world, this can be really useful because it reduces the time you spend building dashboards.

Defining Sort By Columns

Sometimes you will want to sort data in a dashboard visualization based not on the contents of the selected column , but by the contents of another column. As an example, imagine that you have a table of data that contains the month for a sale. If you sort by month, you probably do not want to see the months in alphabetical order, starting with April. In cases such as this, you can tell Power BI Desktop that you want to sort the month name element by the month number that is contained in another column.
  1. 1.
    Load the C:PowerBiDesktopSamplesCH10CarSalesDataFromDataModel.pbix sample file (unless you have already loaded it, of course).
     
  2. 2.
    In the Data View, select the DateDimension table in the Fields list. The date data will appear in the center of the Power BI Desktop window.
     
  3. 3.
    Click inside the MonthFull column. The column will be highlighted.
     
  4. 4.
    In the Modeling ribbon, click the Sort By Column button and select MonthNum from the list of available columns.
     
Now if you sort by MonthFull in a visualization, you see the months in the order that you probably expect—from January to December. Had you not applied a Sort By column, then calendar months would have been sorted in alphabetical order, which is from April to September! Once again, this choice applies to any visualization that you create in a Power BI Desktop dashboard that is based on this data model. So remember to add numeric sort columns alongside textual columns, such as dates and so forth, at the source.
To remove the Sort By column , leave the column selected and simply click the Sort By Column button again and select the original column name from the list of available columns.
Tip
If you want to see which column has been set as a Sort By column, simply click inside the column to be sorted and then click the Sort By Column button. The popup list of columns shows a tick to the left of the column that is being used to sort the selected column.
The sample file for this chapter (CarSalesDataFromDataModel.pbix) already contains columns that you can use as Sort By columns. In the real world, your source data might not always be this instantly useable. So remember that you can always switch to the Power BI Desktop Query Editor and enhance source tables with extra columns that you can then use to sort data in Data View.
Note
In Chapter 13 I will explain ways of creating date tables like the one you just saw.

Hiding Tables and Fields from the User

A data model can conceivably contain many more columns than those that you need to create reports and dashboards. These could include (among others):
  • Sort By columns
  • Columns that already appear in hierarchies (as you will see in a couple of pages’ time)
  • Columns that contain intermediate calculations (these are explained in the next chapter)
  • Columns that contain data that you have loaded “just in case” but don’t need yet
It follows that, in the interests of clarity, you may want to hide columns (or even entire tables) that are not strictly relevant to your analysis. This not only reduces clutter, it can avoid confusion when new users start to work with a data model that you have created. It also guarantees that only essential data will be available.

Hiding Columns and Tables

As a first step, you need to flag columns (or fields, if you prefer) and tables as hidden. Here is how:
  1. 1.
    Click the column or table that you want to hide.
     
  2. 2.
    Select Hide in Report View from the popup menu.
     
Hidden columns and tables will still appear in the Fields list for the moment, but will be grayed out. Hidden columns and the columns in hidden tables can, nonetheless, still be used in visualizations.
To redisplay a hidden column or table, simply right-click the requisite column or table and uncheck the Hide in Report View option in the popup menu.

Removing Hidden Columns and Tables from View

If your objective is to “declutter” the Fields list, you can then remove hidden columns and tables from view completely, as follows:
  • Click any table or field in the Fields list and uncheck the View Hidden option in the popup menu.
All the hidden columns and tables will disappear from the Fields list. To redisplay all the hidden columns and tables, simply check the View Hidden option in the popup menu .

Sorting Data in Power BI Desktop Tables

A Power BI Desktop table could contain millions of rows, so the last thing that you want to have to do is to scroll down through a random dataset. Fortunately, ordering data in a table is simple:
  1. 1.
    Right-click inside the column you want to order the data by. I will choose the Make column in the Stock table.
     
  2. 2.
    Click the Sort Ascending option in the context menu to sort this column in ascending (alphabetical) order.
     
The table is sorted using the selected column as the sort key, and even a large dataset appears correctly ordered in a very short time. If you want to sort a table in descending order (reverse alphabetical or largest to smallest order), click the Sort Descending option in the context menu.
At this juncture, you need to remember that the data model is not really designed for interactive data analysis. That is what dashboards are for. Consequently, you should not expect to use the data tables in Power BI Desktop as if they were vast Excel spreadsheets.
Tip
If you need a visual indication that a column is sorted, look at the right of the column name. You will see a small arrow that faces upward to indicate a descending sort or downward to indicate an ascending sort.
If you want to remove the sort operation that you applied and return to the initial dataset as it was imported, all you have to do is click the Clear Sort icon in the context menu for the column .
Note
You cannot perform complex sort operations; that is, you cannot sort first on one column, then carry out a secondary sort in another column (if there are identical elements in the first column). You also cannot perform multiple sort operations sorting on the least important column and then progressing up to the most important column to sort on to get the effect of a complex sort. This is because Power BI Desktop always sorts the data based on the dataset as it was initially loaded. Remember that you can add index columns (as you saw in Chapter 7) in Power BI Desktop Query and then sort on these if you want to reapply an initial sort order.

Adding Hierarchies

Organizing data can be fundamental when you want to “see the wood for the trees.” Consequently Power BI Desktop lets you create any hierarchy on the fly so that you can better appreciate the structure of the information that you are presenting.
To create a hierarchy:
  1. 1.
    Switch to Report View.
     
  2. 2.
    Select the Stock table.
     
  3. 3.
    Right-click the field that will become the top-level element in the new hierarchy (Make in this example).
     
  4. 4.
    Select New Hierarchy. A new hierarchy named Make Hierarchy will appear in the Stock table, containing the Make field as the top-level element in the hierarchy.
     
  5. 5.
    Drag the Model field onto the new hierarchy title. The Model field will be added to the hierarchy under the highest-level element. You can see this in Figure 10-10.
     
A370912_2_En_10_Fig10_HTML.jpg
Figure 10-10.
An added hierarchy
This is all that you have to do! You can now drag the hierarchy on to the report canvas to create a table that is based on the multiple elements that make up the hierarchy. The data can be used anywhere—and in any visual—where the multiple fields that make up the hierarchy would be used. They are particularly useful as the basis for matrices and drill-down charts, which you will discover in Chapters 15 and 16.
You can, of course, rename or delete the hierarchy or any level in a hierarchy just as you would any standard data field.
Tip
Adding a field to a hierarchy means that the field now exists twice in the table-once as a “stand-alone” field and once inside a hierarchy. In most cases it is best to hide the original version of the field so that users (and you) don’t see the field twice in the data model.

Creating and Modifying Groups

There could be times when you have multiple individual items in a list and you wish to gather separate sets of values into groups . This could be to get a higher-level view of separate subdomains or simply to isolate certain values from a recordset.
Tip
If you like, you can consider that groups are to “horizontal” organization what hierarchies are to “vertical” selection.

Creating a Group

Suppose that you will frequently be using a set of colors as the basis for tables and charts in your dashboards. Here is how to create a group containing a subset of the available colors :
  1. 1.
    Click the Data icon on the top left of the Power BI Desktop window.
     
  2. 2.
    Select the Colors table.
     
  3. 3.
    Click inside the Color column.
     
  4. 4.
    In the Modeling ribbon, click the New Group icon. The Groups dialog will appear.
     
  5. 5.
    In the Name field, enter a name for the group. I will use PowerColors in this example.
     
  6. 6.
    In the list of ungrouped values on the left, Control-click the following five elements:
    1. a.
      Black
       
    2. b.
      Blue
       
    3. c.
      British Racing Green
       
    4. d.
      Dark Purple
       
    5. e.
      Green
       
     
  7. 7.
    Click Group. The dialog will look like the one shown in Figure 10-11.
    A370912_2_En_10_Fig11_HTML.jpg
    Figure 10-11.
    The Groups dialog
     
  8. 8.
    Click OK. A new column will be created containing the group definitions, as you can see in Figure 10-12. The name of the group becomes the new column name.
    A370912_2_En_10_Fig12_HTML.jpg
    Figure 10-12.
    A group column
     
You can now use this column in visuals to group elements. You will see this in greater detail in Chapter 16.
There a couple of final points to note when creating groups :
  • Clicking the Include Other Group check box will take all remaining items in the original column and use them to create an “Other” group.
  • You can create multiple groups. Simple carry out steps 6 and 7 as many times as you want separate groups.
  • No value from the original column can appear more than once in a group .

Modifying a Group

Any existing group can be altered easily, simply by doing the following:
  1. 1.
    Click inside the column containing the group that you want to modify.
     
  2. 2.
    In the Modeling ribbon, click the Edit Groups icon. The Groups dialog will appear.
     
  3. 3.
    Remove any items from the group by clicking the value in the Groups and Members list on the right. Then click Ungroup.
     
  4. 4.
    Add any new values by selecting the item on the left and, once you have selected the appropriate group on the right, clicking the Group button.
     
  5. 5.
    Click OK. The modified group will be applied to the selected column.
     

Deleting a Group

As a group is, to all intents and purposes, a column, you delete a group exactly as you would delete a column :
  1. 1.
    In the Fields list, expand the table containing the new column. You will see that group columns have a small icon with two empty squares to the left of the field name.
     
  2. 2.
    Right-click the group column and select Delete.
     
  3. 3.
    Confirm the deletion .
     
Note
You can “mix and match” groups in a table. You can, for instance, allocate all the separate values in a table to different groups, or you can create multiple groups to handle certain values and nonetheless leave other values outside a group.

Designing a Power BI Desktop Data Model

Congratulations! You are well on the way to developing a high-performance data model for self-service business intelligence (BI) . You have imported data from one or even from several sources into the Power BI Desktop data model. You have taken a good look at your data using the Power BI Desktop data model window and you can carry out essential operations to rename tables and columns. The final step to ensure that your dataset is ready for initial use as a self-service BI data repository is to create and manage relationships between tables. This is a fundamental part of designing a coherent and useable dataset in Power BI Desktop.
Before leaping into the technicalities of table relationships, we first need to answer a couple of simple questions:
  • What are relationships between tables?
  • Why do we need them?
Table relationships are links between tables of data that allow columns in one table to be used meaningfully in another table. If you have opened the Power BI Desktop example file CarSalesDataFromDataModel.pbix, then you can see that there is a Stock table of stock data that contains a ColorID column, but not the actual color itself. As a complement to this, there is a reference table of colors, named Colors. It follows that, if we want to say what color a car was when it was bought, we need to be able to link the tables so that the Stock table can “look up” the actual color of the car that was bought. This requires some commonality between the two tables and, fortunately, both contain a column named ColorID. So if we are able to join the two tables using this field, we can see which color is represented by the color ID for each car in stock.
You can see another example of linking tables if you take a look at the Invoices and InvoiceLines tables. These two tables have been designed using a technique called relational modeling. Essentially this means that two tables have been created to avoid pointless data duplication. So any data that is used to describe an invoice (such as the invoice date or invoice number) is stored in the Invoices table, whereas all the details concerning the vehicles sold are held in a separate table named InvoiceLines. The two tables then share a field that allows them to be joined so that users can see the data from the two tables together if they need to. This means that any elements that would otherwise be repeated are stored in a “header” table such as the Invoices table and nonrepeating data is stored in a detail table (the InvoiceLines table in this example).
It is possible to store the data from these two tables as one table. However, this would mean repeating elements such as the invoice date or invoice number each time that an invoice contained more than one item. This would entail repeated data elements and vastly increased file sizes.
Clearly, these examples are extremely simple. However, they are not unduly contrived. They represent the way many relational databases store data. So there is every chance that you will see potential links or relationships like this in the real-world data that you import from corporate databases. In any case, if you want to use data from multiple sources in your data analysis, you will have to find a way to link the tables using a common field, like the ColorID field that I just mentioned. The reality may be messier (the fields may not have the same name in the two tables, for instance), but the principle always applies.
Tip
If you have the necessary permissions as well as the SQL knowledge, then you can join tables directly in the source database using a query. This way you can create fewer “flattened” tables in Power BI Desktop from the start and avoid having to create a spider’s web of new table relationships in Power BI Desktop.

Data View and Relationship View

Power BI Desktop lets you see your data model in two different yet complementary ways. When you need a high-level overview of all the data tables, you should use Relationship View, as this allows you to step back from the detail and look at the dataset as a whole. The following explains how to do this:
  1. 1.
    Click the Relationship View button on the left of the Power BI Desktop window. Power BI Desktop will display the tables in Relationship View, as shown in Figure 10-13.
     
A370912_2_En_10_Fig13_HTML.jpg
Figure 10-13.
Relationship View
As you can see from Figure 10-13, you are now looking at most or all of your tables, and although you can see the table and column names, you cannot see the data. Not only that, but you can also see that some tables are already joined—though not all of them. This is because Power BI Desktop always attempts to guess any possible relationships between tables and creates relationships automatically, if it can, to save you time and effort.
This is exactly what we need, because now it is time to think in terms of overall structures rather than the nitty-gritty. You can use this view to move and resize the tables. Moving a table is as easy as dragging the table’s title bar. Resizing a table means placing the pointer over a table edge or corner and dragging the mouse.
Tip
Although repositioning tables can be considered pure aesthetics, I find that doing so is really useful. A well-laid-out dataset design helps you understand the relationships between the tables and the inherent structure of the data.

Relationship View Display Options

The whole point of Relationship View is to let you get a good look at the entire dataset and, if necessary, modify the layout in order to see the relationships between tables more clearly.

Maximizing a Table

If you have many, many fields in a table, then you may occasionally need to take a closer look at a single table. Fortunately, the creators of Power BI Desktop have thought of this. So, to zoom in on a specific table:
  1. 1.
    Click the table that you wish to examine more closely. In this example, it will be the Invoices table.
     
  2. 2.
    Click the Maximize button to the right of the table name. The table will expand to give you a clearer view of the fields in the table, whereas the rest of the data model will be grayed out. You can see an example of this in Figure 10-14.
     
A370912_2_En_10_Fig14_HTML.jpg
Figure 10-14.
Maximizing a table

Minimizing a Table

To reset the table to its previous size, click the same icon—now called the Restore button—at the right of the table name .

Creating Relationships

Creating relationships is easy once you know which fields are common between tables. Since we already agreed that we need to join the Colors table to the Stock table, let’s look at how to do this using the file C:PowerBiDesktopSamplesCH10CarSalesDataForLinkingTables.pbix. This file contains the data tables that you saw up until now in this chapter, but without any of the relationships between the tables.
  1. 1.
    Open the Power BI Desktop file CarSalesDataForLinkingTables.pbix in the folder C:PowerBiDesktopSamplesCH10.
     
  2. 2.
    Drag the ColorID field from the Stock table over the ColorID field in the Colors table, as shown in Figure 10-15.
     
A370912_2_En_10_Fig15_HTML.jpg
Figure 10-15.
A table relationship
This is all that you have to do. The two tables are now joined and the data from both tables can be used meaningfully in reports and dashboards.
Note
Currently, in the Power BI Desktop data model, you can only join tables on a single field. You may need to take this into account when preparing queries in the Power BI Desktop Query Editor for later use in the data model.

Creating Relationships Manually

You do not have to drag and drop field names to create relationships. If you prefer, you can specify the tables and fields that will be used to create a relationship between tables. What is more, you do not have to be in Relationship View to do this. So, just to make a point and to show you how flexible Power BI Desktop can be, in this example, you will join the Invoices and InvoiceLines tables on their common InvoiceID field:
  1. 1.
    Select a table that you want to make appear in a relationship. I will use the Invoices table for this example.
     
  2. 2.
    Click the Manage Relationships button in the Home ribbon. The Manage Relationships dialog will appear.
     
  3. 3.
    Click New. The Create Relationship dialog will appear.
     
  4. 4.
    In the upper part of the dialog, select the Invoices table from the popup list of tables.
     
  5. 5.
    In the lower part of the dialog, select the InvoiceLines table as the related lookup table. The InvoiceID field should appear automatically as the field to join on (it will be selected in both tables). If Power BI Desktop has guessed the field, it will appear selected. If it does not, or if it has guessed incorrectly, you can always select the correct field for both tables. The Create Relationship dialog should look like Figure 10-16.
    A370912_2_En_10_Fig16_HTML.jpg
    Figure 10-16.
    The Create Relationship dialog
     
  6. 6.
    Click OK. The Create Relationship dialog will close and return you to the Manage Relationships dialog. It should look like Figure 10-17 at the moment. You can see that it indicates which column in which table is used to join to which other column in which other table.
    A370912_2_En_10_Fig17_HTML.jpg
    Figure 10-17.
    The Manage Relationships dialog
     
  7. 7.
    Click Close. The Manage Relationships dialog will close and the relationship will be created.
     

Creating Relationships Automatically

Whatever the data source, you can have Power BI Desktop detect the relationships automatically . This approach has a couple of advantages:
  • You avoid a lot of manual work.
  • You reduce the risk of error (that is, creating relationships between tables on the wrong fields, or even creating relationships between tables that are not related).
This technique is unbelievably easy. All you do is the following:
  1. 1.
    Click the Manage Relationships button in the Home ribbon. The Manage Relationships dialog will appear.
     
  2. 2.
    Click Autodetect. After a few seconds the dialog shown in Figure 10-18 will appear.
    A370912_2_En_10_Fig18_HTML.jpg
    Figure 10-18.
    The Autodetect dialog
     
  3. 3.
    Click Close. The Manage Relationships dialog will list all the relationships in the data model—both pre-existing relationships and those discovered by the autodetection process. The Manage Relationships dialog will now look like the one in Figure 10-19.
    A370912_2_En_10_Fig19_HTML.jpg
    Figure 10-19.
    The Manage Relationships dialog after automatically detecting relationships
     
  4. 4.
    Click Close to return to the Relationships View .
     
You will see that the tables you just imported already have the relationships generated in Power BI Desktop.

Deleting Relationships

In addition to creating relationships , you will inevitably want to remove them at some point. This is both visual and intuitive.
  1. 1.
    Click the Design View button in the Home ribbon. Power BI Desktop will display the tables in Relationship View.
     
  2. 2.
    Select the relationship that you want to delete. The arrow joining the two tables will become a double link, and the two tables will be highlighted.
     
  3. 3.
    Right-click and choose Delete (or press the Delete key). The Confirmation dialog will appear, as shown in Figure 10-20.
    A370912_2_En_10_Fig20_HTML.jpg
    Figure 10-20.
    The Delete Relationship dialog
     
  4. 4.
    Click Delete.
     
The relationship will be deleted. However, the tables will remain in the data model .

Managing Relationships

If you wish to change the field in a table that serves as the basis for a relationship, then you have another option. You can use the Manage Relationships dialog. This approach may also be useful if you want to create or delete several relationships at once. If you want to use this dialog:
  1. 1.
    In the Design tab, of the Home ribbon click Manage Relationships. The Manage Relationships dialog appears, as was shown previously in Figure 10-17.
     
  2. 2.
    Click the relationship you wish to modify.
     
  3. 3.
    Click Edit (or double-click the relationship). The Edit Relationship dialog appears (it is virtually identical to the Create Relationship dialog shown in Figure 10-16).
     
  4. 4.
    Continue modifying the relationship as described previously.
     
As you can see from this dialog, you also have the option of creating or deleting relationships. Since the processes here are identical to those I have already described, I will not repeat them.
The techniques used to create and manage relationships are not, in themselves, very difficult to apply. It is nonetheless absolutely fundamental to establish the correct relationships between the tables in the dataset. Put simply, if you try to use data from unconnected tables in a single Power BI Desktop dashboard, not only will you get an alert warning you that relationships need to be created, you will also get visibly inaccurate results. Basically, all of your analyses will be false. So it is well worth it to spend a few minutes upfront designing a clean, accurate, and logically coherent dataset .
Tip
Double-clicking a relationship in the Relationship View will also display the Edit Relationship dialog.

Deactivating Relationships

If you no longer need a relationship between tables but do not want to delete it, you also have the option of deactivating the relationship. This means that the relationship no longer functions , but that you can reactivate it quickly and easily.
Deactivating—or reactivating—a relationship is as simple as selecting or unselecting the box in the Active column of the Manage Relationships dialog (shown earlier in Figure 10-17).

Advanced Relationship Options

The Edit Relationship dialog contains a few advanced options that you could find useful on occasion. These include a couple of fundamental options that you have to take into consideration when defining the structure of a data model:
  • Cardinality
  • Cross filter direction
These are the subject of the next two subsections.

Cardinality

Cardinality defines how many columns in one table relate to the matching column in the linked table. You can see the available choices in the cardinality popup in Figure 10-21.
A370912_2_En_10_Fig21_HTML.jpg
Figure 10-21.
Cardinality in table relationships
These various options are described in Table 10-6.
Table 10-6.
Power BI Desktop Relationship Types
Relationship Option
Description
Many to One
Specifies that there are many records in one table for a single record that maps in the table that is joined.
One to One
Specifies that there is a single record in one table for a single record that maps in the table that is joined.
One to Many
Specifies that there is a single record in one table for many records that map in the table that is joined.
Power BI Desktop will nearly always detect the correct cardinality for a relationship. However, if you want to override the choice made by the software because you suspect that another type of cardinality will be required in the future, for instance, then you simply select the cardinality that you want from the popup list.

Cardinality Issues

Data is not always perfect, unfortunately. So there will be times when you will select the cardinality that you want to use to join two tables and will get an error message like the one shown in Figure 10-22.
A370912_2_En_10_Fig22_HTML.jpg
Figure 10-22.
Invalid cardinality
It is impossible to cover all the multiple reasons that can provoke this error. So, if you see this message, here are a few pointers to try and help you sort out this issue, should you encounter it:
  • For a one-to-many cardinality , ensure that the values for the field on the “one” side of the relationship exist for all the corresponding values on the “many” side.
  • For a one-to-one cardinality, check that there are no duplicate values for the fields on either side of the table relationship.
  • For a one-to-one cardinality, make sure that there are no empty or null values for the fields on either side of the table relationship.
  • For a one-to-many cardinality, ensure that the values for the field on the “one” side of the relationship do not contain empty or null values.

Managing Relationships Between Tables

Managing relationships in Power BI Desktop is often the key to creating an efficient data model. It is, however, outside the scope of this book to provide a complete course in data modeling. Nonetheless, here are a few tips to bear in mind when creating your initial data models:
  • It can help to think in terms of main tables and lookup tables. In the data model that you have looked at in this chapter, you can consider certain tables as lookup tables for the main data, such as the Colors, Countries, and Clients tables.
  • Lookup tables generally contain a series of values that are not repeated in the table (a list of countries, for instance). These values are called the “one” side of a relationship and are linked to another table where they are referred to on many occasions. Hence the table that contains the multiple references is called the “many” side of the relationship. This is also called the cardinality of a relationship by database and data warehouse designers.
  • If you take a look at Figure 10-10 (the Power BI Desktop Relationship View) you can see that the different types of relationship are indicated in the join between related tables. A small 1 at the end of a relationship indicates the “one” side of a relationship, and an asterisk indicates the “many” side of the relationship.
  • If your source data contains many lookup tables that cascade down through a series of relationships (a classic case is the Category ➤ Subcategory ➤ Product hierarchy that you find in many retail environments), to avoid overcomplicating the data model , you may prefer to merge multiple tables into a single table using Power BI Desktop Query before developing the data model in Power BI Desktop itself.
  • Sometimes—and this can be the case when importing data from relational databases—you need to join tables on more than one field. This is not possible in Power BI Desktop. However, you can often find workarounds to this, again using Power BI Desktop Query before modeling the data. In cases like this, you can merge tables by creating joins using multiple columns (as you saw in Chapter 8), for instance.
  • Data imported from data warehouses can have a built-in structure of facts (main tables containing metrics) and dimensions (containing lookup elements). However, you may want to “flatten” complex hierarchies of dimensions and create single-level tables of lookup elements here, too, using Power BI Desktop Query .
  • Sometimes you may want to use the same table twice in different contexts. For instance, a date table may be useful to join to a sale date and a purchase date. In cases like this, you can reimport the date table a second time (and give it another name) and then create two separate joins from a lookup table to the two different lookup tables. This allows you to filter and aggregate data by separate date criteria. A lookup table like this is often called a role-playing dimension.
  • It is possible to create multiple relations between one table and another, and to specify that only one of them is active. You can then force a calculation to apply a nonactive relationship using the USERELATIONSHIP() function in DAX. However, we are already starting to reach more complex levels of data modeling, so I will only mention the possibility here.

Cross Filter Direction

You can see the two cross-filtering options in Figure 10-23. These allow you to specify whether filters will apply to all tables in a joined set of tables or only in the table where an aggregation is being carried out.
A370912_2_En_10_Fig23_HTML.jpg
Figure 10-23.
Cross filter direction in table relationships
The two options are described in Table 10-7.
Table 10-7.
Cross Filter Direction Options
Cross Filter Direction Option
Description
Single
Filters on the linked table will apply to the table where the data aggregations take place, but not the reverse.
Both
For filtering purposes, both tables are considered as if they are a single, larger, table. All data in all tables will be filtered.

Other Relationship Options

The fundamental remaining options for relationships are
  • Make this relationship active
  • Assume referential integrity

Make this Relationship Active

In some data models, there can be multiple relationships (on different fields) between tables. However, only one of these relationships can be active at any one time. Clicking the “Make this relationship” active check box tells Power BI Desktop that the selected relationship is the active one. Other relationships can, however, be used in DAX code, as you will see in the upcoming chapters.

Assume Referential Integrity

This option is only available when using DirectQuery (which you learned to use in Chapter 4). By definition , this option applies only to a relational database. Enabling this lets Power BI Desktop send more efficient queries to the underlying data source. This means that data is updated faster. For this to work, there are a couple of basic requirements :
  • Data in the From column in the relationship can never be Null or blank.
  • For each data element in the “from” column, there is a corresponding value in the “to” column.
  • The “from” column is on the “many” side in a one-to-many relationship, or the relationship is a one-to-one type of join.
Note
You need to be aware that setting the “Assume referential integrity” option when the underlying data does not fulfill the preceding requirements will not prevent the option from being applied, or even data being returned. However, the results might be erroneous.

Reimporting Related Tables

There is one fairly important point to make to conclude this chapter. This is that if you delete a set of related tables and subsequently reimport them without importing the relationships , then Power BI Desktop will not remember the relationships that existed previously. Consequently, you will have to re-create any relationships manually. The same is true if you delete and reimport any table that you linked to an existing table in Power BI Desktop—once a relationship has been removed through the process of deleting a table, you will have to re-create it.

Conclusion

This chapter was all about taking the clean data that you had prepared using Power BI Desktop Query and molding it into a structured and coherent data model that will be the basis for your dashboards and reports.
To begin with, you saw how to look at the whole dataset that was now available in the data model. This included sorting the data and adjusting column widths.
Then you learned how to ensure that each column was defined as having the appropriate data type. After that you applied any number formats that would be required in future dashboard elements directly to the data model. You also saw how to prepare certain types of column for use in maps or to provide hyperlinks.
Most importantly, you then learned how to create table relationships that pull all the disparate data sources together in a joined-up data model that has now become the basis for some in-depth analytics. This will, in most circumstances, be the singular most important aspect of the data model. Simply put, a good, clean, and well-structured data model will allow you to get the most out of your data.
All that you have to do now is add any calculated metrics that your reports need. This is the subject of the next three chapters.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.150.219