©  Adam Aspin 2018
Adam AspinPro Power BI Desktophttps://doi.org/10.1007/978-1-4842-3210-1_6

6. Dealing with Datasets

Adam Aspin
(1)
Stoke-on-Trent, Staffordshire, UK
 
In the previous four chapters you saw some of the ways in which you can find and load data into the Power BI Desktop data model. Inevitably, this is the first part of any process that you follow to extract, transform, and load data. Yet it is quite definitely only a first step. Once the data is in Power BI Desktop, you need to know how to adapt it to suit your requirements in a multitude of ways. This is because not all data is ready to be used immediately. Quite often, you have to do some initial work on the data to make it more easily useable in Power BI Desktop. Tweaking source data is generally referred to as data transformation, which is the subject of this chapter as well as the next two.
The range of transformations that Power BI Desktop offers is extensive and varied. Learning to apply the techniques that Power BI Desktop makes available enables you to take data as you find it, then cleanse it and push it back into the Power BI Desktop data model as a series of coherent and structured data tables. Only then is it ready to be used to create compelling dashboards and reports.
As it is all too easy to be overwhelmed (at least initially) by the extent of the data transformation options that Power Bi Desktop has to offer, I have grouped the possible modifications into four categories. These categories are my own, and are merely a suggestion to facilitate understanding:
  • Data transformation : This includes adding and removing columns and rows, renaming columns, as well as filtering data.
  • Data modification : This covers altering the actual data in the rows and columns of a dataset.
  • Extending datasets : This encompasses adding further columns, possibly expanding existing columns into more columns or rows, and adding calculations.
  • Joining datasets : This involves combining multiple separate datasets—possibly from different data sources—into a single dataset.
This chapter introduces you to the core techniques that you can apply to shape each individual dataset that you have loaded. These transformations include
  • Renaming, removing, and reordering columns
  • Removing groups or sets of rows
  • Deduplicating datasets
  • Sorting the data
  • Excluding records by filtering the data
  • Grouping records
In the next chapter, you learn how to cleanse and modify data. In Chapter 8, you see how to subset columns to extract part of the available data in a column, calculate columns, merge data from separate queries, and add further columns containing different types of calculations, and you learn about pivoting and unpivoting data. So, if you cannot find what you are looking for in this chapter, there is a good chance that the answer is in the following two chapters.
In this chapter, I will also use a set of example files that you can find on the Apress web site. If you have followed the instructions in Appendix A, then these files are in the C:PowerBiDesktopSamplesCH06 folder.

Power BI Desktop Queries

In Chapter 1, you saw how to load source data directly into Power BI Desktop and use it immediately to create dashboards. Clearly, this approach presumes that the data that you are using is perfectly structured, clean, and error-free. Source data is nearly always correct and ready to use in reports and dashboards when it comes from “corporate” data sources such as data warehouses (held in relational, dimensional, or tabular databases). This is not always the case when you are faced with multiple disparate sources of data that have not been precleansed and prepared by an IT department. The everyday reality is that you could have to cleanse and transform much of the source data that you will use for your Power BI Desktop dashboards.
The really good news is that the kind of data transformation that used to require expensive servers and industrial-strength software is now available for free. Yes, Power BI Desktop comes with an awesome ETL (Extract, Transform, and Load) tool that can rival many applications that cost hundreds of thousands of dollars.
Power BI Desktop data transformation is carried out using queries. As you saw in Chapter 1, you do not have to modify source data. You can load it directly if it is ready for use. Yet if you need to cleanse the data, you add an intermediate step between connecting the data and loading it into the Power BI Desktop data model. This intermediate step uses the Power BI Desktop Query Editor to tweak the source data.
So how do you apply queries to transform your data? You have two choices:
  • Load the data first from one or more sources, and then transform it later.
  • Edit each source data element in a query before loading it.
Power BI Desktop is extremely forgiving. It does not force you to select one or the other method and then lock you in to the consequences of your decision. You can load data first and then realize that it needs some adjustment, switch to the Query Editor and make changes, and then return to creating your dashboard. Or you can first focus on the data and try to get it as polished and perfect as possible before you start building reports. The choice is entirely up to you.
To make this point, let’s take a look at both of these ways of working.
Note
At risk of being pedantic and old-fashioned, I would advise you to make notes when creating really complex transformations, because going back to a solution and trying to make adjustments later can be painful when they are not documented at all.

Editing Data After a Data Load

In Chapter 1, you saw how to load the Excel workbook CarSales.xlsx directly into the Power BI Desktop data model and use it to create a starter dashboard. Now let’s presume that you want to make some changes to the data structure of the data that you have already loaded. Specifically, you want to rename the CostPrice column. The file that you want to modify is CH06Example1.pbix file in the C:PowerBiDesktopSamplesCH06 directory.
  1. 1.
    Launch Power BI Desktop.
     
  2. 2.
    Open the sample file C:PowerBiDesktopSamplesCH06CH06Example1.pbix. Take a look at the Fields list and note that there is a field named CostPrice.
     
  3. 3.
    In the Power BI Desktop Home ribbon, click the Edit Queries button. The Power BI Desktop Query Editor will open and display the source data as a table. The window will look like Figure 6-1.
    A370912_2_En_6_Fig1_HTML.jpg
    Figure 6-1.
    The Power BI Desktop Query Editor
     
  4. 4.
    Right-click the title of the CostPrice column. The column will be selected and the title will appear in yellow.
     
  5. 5.
    Select Rename from the context menu.
     
  6. 6.
    Type VehicleCost and press Enter. The column title will change to VehicleCost.
     
  7. 7.
    In the Power BI Desktop Query Editor Home ribbon, click the Close & Apply button. The Power BI Desktop Query Editor will close and return you to the Power BI Desktop window. VehicleCost has replaced CostPrice anywhere that it was used in the dashboard. This is immediately visible in the Fields list.
     
I hope that this simple example makes it clear that transforming the source data is a quick and painless process. The technique that you applied—renaming a column—is only one of many dozens of possible techniques that you can apply to transform your data. However, it is not the specific transformation that is the core idea to take away here. What you need to remember is that the data that underpins your dashboard is always present and it is only a single click away. At any time, you can “flip” to the data and make changes, simply by clicking the Edit Queries button in the Power BI Desktop window. Any changes that you make and confirm will update your dashboards and reports instantaneously.
Note
Alternatively, if you want you can edit the query behind any table that is visible in the Report or Data views simply by right-clicking in the table and selecting Edit Query from the context menu.

Transforming Data Before Loading

On some occasions, you might prefer to juggle with your data before you load it. This is a variation on the approach that you have used in Chapter 1 when creating a simple dashboard. Do the following to transform your data before it appears in the Power BI Desktop window:
  1. 1.
    Open a new Power BI Desktop window.
     
  2. 2.
    In the Power BI Desktop Home ribbon, click the tiny triangle in the Get Data button.
     
  3. 3.
    Select Excel in the menu and open the Excel file C:PowerBiDesktopSamplesCH06CarSales.xlsx.
     
  4. 4.
    In the Navigator window, select the BaseData worksheet.
     
  5. 5.
    Click the Edit button (not the Load button).
     
  6. 6.
    The Power BI Desktop Query Editor will open and display the source data as a table.
     
  7. 7.
    Carry out steps 4 through 6 from the previous example to rename the CostPrice column.
     
  8. 8.
    In the Power BI Desktop Query Editor Home ribbon, click the Close & Apply button. The Power BI Desktop Query Editor will close and return you to the Power BI Desktop window. You will see the Apply Query Changes dialog while the data is loaded, like the one that you can see in Figure 6-2.
     
A370912_2_En_6_Fig2_HTML.jpg
Figure 6-2.
The Apply Query Changes dialog
This time, you have made a simple modification to the data before loading the dataset into the Power BI Desktop data model. The data modification technique was exactly the same. The only difference between loading the data directly and taking a detour via the Query window was clicking Edit instead of Load in the Navigator dialog. This means that the data was only loaded once you had finished making any modifications to the source data in the Power BI Desktop Query Editor.

Query or Load?

Power BI Desktop always gives you the choice of loading data directly into its data model or taking a constructive detour via Power BI Desktop Query. The path that you follow is entirely up to you and clearly depends on each set of circumstances. Nonetheless, you might want to consider the following basic principles when faced with a new dashboarding challenge using unfamiliar data:
  • Are you convinced that the data is ready to use? That is, is it clean and well structured? If so, then you can try loading it directly into the Power BI Desktop data model.
  • Are you faced with multiple data sources that need to be combined and molded into a coherent structure? If this is the case, then you really need to transform the data using Power BI Desktop Query.
  • Does the data come from an enterprise data warehouse? This could be held in a relational database, a SQL Server Analysis Services cube, or even an in-memory tabular data warehouse. As these data sources are nearly always the result of many hundreds—or even thousands—of hours of work cleansing, preparing, and structuring the data, you can probably load these straight into the data model.
  • Does the data need to be preaggregated and filtered? Think Power BI Desktop Query.
  • Are you likely to need to change the field names to make the data more manageable? It could be simpler to load the data directly into the data model and change them there.
  • Are you faced with lots of lookup tables that need to be added to a “core” data table? Then Power BI Desktop Query is your friend.
  • Does the data contain many superfluous or erroneous elements? Then use Power BI Desktop Query to remove these as a first step.
  • Does the data need to be rationalized and standardized to make it easier to handle? In this case, the path to success is via Power BI Desktop Query.
  • Is the data source enormous? If this is the case, you could save time by editing the data first in the Query Editor. This is because the Query Editor only loads a sample of the data for you to tweak. The entire dataset will only be loaded when you confirm all your modifications and close the Query Editor.
These kinds of questions are only rough guidelines. Yet they can help to point you in the right direction when you are working with Power BI Desktop. Inevitably, the more that you work with this application, the more you will develop the reflexes and intuition that will help you make the correct decisions. Remember, however, that Power BI Desktop is there to help, and that even a directly loaded dataset is based on a query. So you can always load data and then decide to tweak the query structure later if you need to. Alternatively, editing data in a Query window can be a great opportunity to take a closer look at your data before loading it into the data model—and it only adds a couple of clicks.
So feel free to adopt a way of working that you feel happy with. Power BI Desktop will adapt to your style easily and almost invisibly, letting you switch from data to dashboards so fluidly that it will likely become second nature.
The remainder of this chapter will take you through some of the core techniques that you need to know to cleanse and shape your data. However, before getting into all the detail, let’s take a quick, high-level look at the Power BI Desktop Query Editor and the way that it is laid out.

The Power BI Desktop Query Editor

All of your data transformation will take place in the Power BI Desktop Query Editor. It is a separate window from the one where you create your dashboards and it has a slightly different layout.
The Power BI Desktop Query Editor consists of six main elements:
  • The four ribbons: Home, Transform, Add Column, and View
  • The Query list pane containing all the queries that have been added to a Power BI Desktop file
  • The Data window, where you can see a sample of the data for a selected query
  • The Query Settings pane that contains the list of steps used to transform data
  • The formula bar above the data that shows the code (written in the Power BI “M” language) that performs the selected transformation step
  • The status bar (at the bottom of the window) that indicates useful information, such as the number of rows and columns in a query table, and the date when the dataset was downloaded
The callouts for these elements are shown in Figure 6-3.
A370912_2_En_6_Fig3_HTML.jpg
Figure 6-3.
The Power BI Desktop Query Editor, explained
Note
Occasionally you will see other ribbons appear in specific circumstances. You can see an example of this for dealing with lists in Chapters 8 and 9.

The Applied Steps List

Data transformation is by its very nature a sequential process. So the Query window stores each modification that you make when you are cleansing and shaping source data. The various elements that make up a data transformation process are listed in the Applied Steps list of the Query Settings pane in the Query Editor.
The Power BI Desktop Query Editor does not number the steps in a data transformation process, but it certainly remembers each one. They start at the top of the Applied Steps list (nearly always with the Source step) and can extend to dozens of individual steps that trace the evolution of your data until you load it into the data model. You can, if you want, consider the Query Editor as a kind of “macro recorder.”
Moreover, as you click each step in the Applied Steps list, the data in the Data window changes to reflect the results of each transformation, giving you a complete and visible trail of all the modifications that you have applied to the dataset.
The Applied Steps list gives a distinct name to the step for each and every data modification option that you cover in this chapter and the next. As it can be important to understand exactly what each function actually achieves, I will always draw to your attention the standard name that Power BI Desktop Query applies.

The Power BI Desktop Query Editor Ribbons

Power BI Desktop Query Editor uses (in the August 2017 version, at least) four ribbons. They are fundamental to what you learn in the course of this chapter. They are as follows:
  • The Home ribbon
  • The Transform ribbon
  • The Add Column ribbon
  • The View ribbon
I am not suggesting for a second that you need to memorize what all the buttons in these ribbons do. What I hope is that you are able to use the following brief descriptions of the Query Editor ribbon buttons to get an idea of the amazing power of Power BI Desktop in the field of data transformation. So if you have an initial dataset that is not quite as you need it, you can take a look at the resources that Power BI Desktop has to offer and how they can help. Once you find the function that does what you are looking for, you can jump to the relevant section for the full details on how to apply it.

The Home Ribbon

Since we will be making intense use of the Power BI Desktop Query Editor Home ribbon to transform data, it is important to have an idea of what it can do. I explain the various options in Figure 6-4 and in Table 6-1.
A370912_2_En_6_Fig4_HTML.jpg
Figure 6-4.
The Query Editor Home ribbon
Table 6-1.
Query Editor Home Ribbon Options
Option
Description
Close & Apply
Finishes the processing steps; saves and closes the query.
New Source
Lets you discover and add a new data source to the set of queries.
Recent Sources
Lists all the recent data sources that you have used.
Enter Data
Lets you add your own specific data in a custom table.
Data Source Settings
Allows you to manage settings for data sources that you have already connected to.
Manage Parameters
Lets you view and modify any parameters defined for this Power BI Desktop file.
Refresh Preview
Refreshes the preview data.
Properties
Displays the core query properties.
Advanced Editor
Displays the “M” language editor.
Manage
Lets you delete, duplicate, or reference a query.
Choose Columns
Lets you select the columns to retain from all the columns available in the source data.
Remove Columns
Lets you remove one or more columns.
Keep Rows
Keeps the specified number of rows at the top of the table.
Remove Rows
Removes a specified number of rows from the top of the data table.
Sort
Sorts the table using the selected column as the sort key.
Split Column
Splits a column into one or many columns at a specified delimiter or after a specified number of characters.
Group By
Groups the table using a specified set of columns and aggregates any numeric columns for this grouping.
Data Type
Applies the chosen data type to the column.
Use First Row As Headers
Uses the first row as the column titles.
Replace Values
Carries out a search-and-replace operation on the data in a column or columns. This only affects the complete data in a column.
Merge Queries
Joins a second query table to the current query results and aggregates or adds data from the second to the first.
Append Queries
Adds the data from another query to the current query in the current Power BI Desktop file.
Combine Files
Adds the data from a series of similarly structured files into a single table.

The Transform Ribbon

The Transform ribbon, as its name implies, contains a wealth of functions that can help you to transform your data. The various options it contains are explained in Figure 6-5 and Table 6-2.
A370912_2_En_6_Fig5_HTML.jpg
Figure 6-5.
The Query Editor Transform ribbon
Table 6-2.
Query Editor Transform Ribbon Options
Option
Description
Group By
Groups the table using a specified set of columns; aggregates any numeric columns for this grouping.
Use First Row As Headers
Uses the first row as the column titles.
Transpose
Transforms the columns into rows and the rows into columns.
Reverse Rows
Displays the source data in reverse order, showing the final rows at the top of the window.
Count Rows
Counts the rows in the table and replaces the data with the row count.
Data Type
Applies the chosen data type to the column.
Detect Data Type
Detects the correct data type to apply to multiple columns.
Rename
Renames a column.
Replace Values
Carries out a search-and-replace operation inside a column, replacing a specified value with another value.
Fill
Copies the data from cells above or below into empty cells in the column.
Pivot Column
Creates a new set of columns using the data in the selected column as the column titles.
Unpivot Columns
Takes the values in a set of columns and unpivots the data, creating two new columns using the column headers as the descriptive elements.
Move
Moves a column.
Convert to List
Converts the contents of a column to a list.
Split Column
Splits a column into one or many columns at a specified delimiter or after a specified number of characters.
Format
Modifies the text format of data in a column (uppercase, lowercase, capitalization) or removes trailing spaces.
Merge Columns
Takes the data from several columns and places it in a single column, adding an optional separator character.
Extract
Replaces the data in a column using a defined subset of the current data. You can specify a number of characters to keep from the start or end of the column, set a range of characters beginning at a specified character, or even list the number of characters in the column.
Parse
Creates an XML or JSON document from the contents of each cell in a column.
Statistics
Returns the Sum, Average, Maximum, Minimum, Median, Standard Deviation, Count, or Distinct Value Count for all the values in the column.
Standard
Carries out a basic mathematical calculation (add, subtract, divide, multiply, integer-divide, or return the remainder) using a value that you specify applied to each cell in the column.
Scientific
Carries out a basic scientific calculation (square, cube, power of n, square root, exponent, logarithm, or factorial) for each cell in the column.
Trigonometry
Carries out a basic trigonometric calculation (Sine, Cosine, Tangent, ArcSine, ArcCosine, or ArcTangent) using a value that you specify applied to each cell in the column.
Rounding
Rounds the values in the column either to the next integer (up or down) or to a specified factor.
Information
Replaces the value in the column with simple information: Is Odd, Is Even, or Positive/Negative.
Date
Isolates an element (day, month, year, etc.) from a date value in a column.
Time
Isolates an element (hour, minute, second, etc.) from a date/time or time value in a column.
Duration
Calculates the duration from a value that can be interpreted as a duration in days, hours, minutes, and so forth.
Expand
Adds the (identically structured) data from another query to the current query.
Aggregate
Calculates the sum or product of numeric columns from another query and adds the result to the current query.
Extract Values
Extracts the values of the contents of a column as a single text value.
Scripts
Runs scripts from languages such as “R.”

The Add Column Ribbon

The Add Column ribbon does a lot more than just add columns. It also contains functions to break columns down into multiple columns, and to add columns containing dates and calculations based on existing columns. The various options it contains are explained in Figure 6-6 and Table 6-3.
A370912_2_En_6_Fig6_HTML.jpg
Figure 6-6.
The Query Editor Add Column ribbon
Table 6-3.
Query Editor Add Column Ribbon Options
Option
Description
Column From Examples
Lets you use one or more columns as examples to create a new column.
Custom Column
Adds a new column using a formula to create the column’s contents.
Invoke Custom Function
Applies an “M” language function to every row.
Conditional Column
Adds a new column that conditionally adds the values from the selected column.
Index Column
Adds a sequential number in a new column to uniquely identify each row.
Duplicate Column
Creates a copy of the current column.
Format
Modifies the text format of data in a column (uppercase, lowercase, capitalization) or removes trailing spaces.
Merge Columns
Takes the data from several columns and places it in a single column, adding an optional separator character.
Extract
Replaces the data in a column using a defined subset of the current data. You can specify a number of characters to keep from the start or end of the column, set a range of characters beginning at a specified character, or even list the number of characters in the column.
Parse
Creates an XML or JSON document from the contents of each cell in a column.
Statistics
Returns the Sum, Average, Maximum, Minimum, Median, Standard Deviation, Count, or Distinct Value Count for all the values in the column .
Standard
Carries out a basic mathematical calculation (add, subtract, divide, multiply, integer-divide, or return the remainder) using a value that you specify applied to each cell in the column.
Scientific
Carries out a basic scientific calculation (square, cube, power of n, square root, exponent, logarithm, or factorial) for each cell in the column.
Trigonometry
Carries out a basic trigonometric calculation (Sine, Cosine, Tangent, ArcSine, ArcCosine, or ArcTangent) using a value that you specify applied to each cell in the column.
Rounding
Rounds the values in the column either to the next integer (up or down) or to a specified factor.
Information
Replaces the value in the column with simple information: Is Odd, Is Even, or Positive/Negative.
Date
Isolates an element (day, month, year, etc.) from a date value in a column.
Time
Isolates an element (hour, minute, second, etc.) from a date/time or time value in a column.
Duration
Calculates the duration from a value that can be interpreted as a duration in days, hours, minutes, and seconds.

The View Ribbon

The View ribbon lets you alter some of the Query Editor settings and see the underlying data transformation code. The various options that it contains are explained in the next chapter.

Dataset Shaping

So you are now looking at a data table that you have loaded into Power BI Desktop. For argument’s sake, let’s assume that it is the C:PowerBiDesktopSamplesCH06CH06Example1.pbix file from the sample data directory, and that you have clicked the Edit Queries button to display the Power BI Desktop Query Editor. What can you do to the BaseData dataset that is now visible? It is time to take a look at some of the core techniques that you can apply to shape the initial dataset. These include the following:
  • Renaming columns
  • Reordering columns
  • Removing columns
  • Merging columns
  • Removing records
  • Removing duplicate records
  • Filtering the dataset
I have grouped these techniques together as they affect the initial size and shape of the data. Also, it is generally not only good practice, but also easier for you, the data modeler, if you begin by excluding any rows and columns that you do not need. I also find it easier to understand datasets if the columns are logically laid out and given comprehensible names from the start. All in all, this makes working with the data easier in the long run.

Renaming Columns

Although we took a quick look at renaming columns in the first pages of this chapter, let’s look at this technique again in more detail. I admit that renaming columns is not actually modifying the form of the data table. However, when dealing with data, I consider it vital to have all data clearly identifiable. This implies meaningful column names being applied to each column. Consequently, I consider this modification to be fundamental to the shape of the data and also as an essential best practice when importing source data.
To rename a column:
  1. 1.
    Click inside the column that you want to rename.
     
  2. 2.
    Click Transform to activate the Transform ribbon.
     
  3. 3.
    Click the Rename button. The column name will be highlighted.
     
  4. 4.
    Enter the new name or edit the existing name.
     
  5. 5.
    Press Enter or click outside the column title.
     
The column will now have a new title. The Applied Steps list on the right will now contain another element, Renamed Columns. This step will be highlighted.
Note
As an alternative to using the Transform ribbon, you can right-click the column title and select Rename.

Reordering Columns

Power BI Desktop will load data as it is defined in the data source. Consequently, the column sequence will be entirely dependent on the source data (or by a SQL query if you used a source database, as described in Chapter 2). This column order need not be definitive, however, and you can reorder the columns if that helps you understand and deal with the data. Do the following to change column order:
  1. 1.
    Click the header of the column you want to move.
     
  2. 2.
    Drag the column left or right to its new position. You will see the column title slide laterally through the column titles as you do this, and a thicker gray line will indicate where the column will be placed once you release the mouse button. Reordered Columns will appear in the Applied Steps list.
     
Figure 6-7 shows this operation.
A370912_2_En_6_Fig7_HTML.jpg
Figure 6-7.
Reordering columns
If your query contains dozens—or even hundreds—of columns , you may find that dragging a column around can be slow and laborious. Equally, if columns are extremely wide, it can be difficult to “nudge” a column left or right. Power BI Desktop can come to your aid in these circumstances with the Move button in the Transform ribbon. Clicking this button gives you the menu options that are outlined in Table 6-4.
Table 6-4.
Move Button Options
Option
Description
Left
Moves the currently selected column to the left of the column on its immediate left.
Right
Moves the currently selected column to the right of the column on its immediate right.
To Beginning
Moves the currently selected column to the left of all the columns in the query.
To End
Moves the currently selected column to the right of all the columns in the query.
The Move command also works on a set of columns that you have selected by Ctrl-clicking and/or Shift-clicking. Indeed, you can move a selection of columns that is not contiguous if you need to.
Note
You need to select a column (or a set of columns) before clicking the Move button. If you do not, then the first time that you use Move, Power BI Desktop Query selects the column(s) but does not move it.

Removing Columns

So how do you delete a column or series of columns? Like this:
  1. 1.
    Click inside the column you want to delete, or if you want to delete several columns at once, Ctrl-click the titles of the columns that you want to delete.
     
  2. 2.
    Click the Remove Columns button in the Home ribbon. The column(s) will be deleted and Removed Columns will be the latest element in the Applied Steps list.
     
When working with imported datasets over which you have had no control, you may frequently find that you only need a few columns of a large data table. If this is the case, you will soon get tired of Ctrl-clicking numerous columns to select those you want to remove. Power BI Desktop has an alternative method. Just select the columns you want to keep and delete the others. To do this:
  1. 1.
    Ctrl-click the titles of the columns that you want to keep.
     
  2. 2.
    Click the small triangle in the Remove Columns button in the Home ribbon. Select Remove Other Columns from the menu. All unselected columns will be deleted and Removed Other Columns will be added to the Applied Steps list.
     
When selecting a contiguous range of columns to remove or keep, you can use the standard Windows Shift-click technique to select from the first to the last column in the block of columns that you want to select.
Note
Both of these options for removing columns are also available from the context menu, if you prefer. It shows Remove (or Remove Columns, if there are several columns selected) when deleting columns, as well as Remove Other Columns if you right-click a column title.

Choosing Columns

If you prefer not to scroll through a wide dataset, yet still need to select a subset of columns as the basis for your reports, then there is another way to define the collection of fields that you want to use. You can choose the columns that you want to keep (and, by definition, those that you want to exclude) like this:
  1. 1.
    Open the sample file CH06Example1.pbix in the folder C:PowerBiDesktopSamplesCH06 unless it is already open.
     
  2. 2.
    In the Home ribbon, click the Edit Queries button. The Query Editor will open.
     
  3. 3.
    In the Home ribbon of the Query Editor, click the Choose Columns button.
     
  4. 4.
    Click (Select All Columns) to deselect the entire collection of columns in the dataset.
     
  5. 5.
    Select the columns Make, Model, Color, and SalePrice. The Choose Columns dialog will look like the one in Figure 6-8.
    A370912_2_En_6_Fig8_HTML.jpg
    Figure 6-8.
    The Choose Columns dialog
     
  6. 6.
    Click OK. The Query Editor will only display the columns that you selected.
     
The Choose Columns dialog comes with a couple of extra functions that you might find useful when choosing the set of columns that you want to work with:
  • You can sort the column list in alphabetical order (or, indeed, revert to the original order) by clicking the Sort icon (the small A-Z) at the top right of the Choose Columns dialog and selecting the required option.
  • You can filter the list of columns that is displayed simply by entering a few characters in the Search Columns field at the top of the dialog and then pressing Enter.
  • The (Select All Columns) option switches between selecting and deselecting all the columns in the list.

Merging Columns

Source data is not always exactly as you wish it could be (and that is sometimes a massive understatement). Certain data sources could have data spread over many columns that could equally well be merged into a single column. So it probably comes as no surprise to discover that Power BI Desktop Query can carry out this kind of operation too. Here is how to do it:
  1. 1.
    Ctrl-click the headers of the columns that you want to merge (Make and Model in the BaseData dataset in this example).
     
  2. 2.
    In the Transform ribbon, click the Merge Columns button. The Merge Columns dialog will be displayed.
     
  3. 3.
    From the Separator pop-up menu, select one of the available separator elements. I chose Colon in this example.
     
  4. 4.
    Enter a name for the column that will be created from the two original columns (I am calling it MakeAndModel). The dialog should look like Figure 6-9.
    A370912_2_En_6_Fig9_HTML.jpg
    Figure 6-9.
    The Merge Columns dialog
     
  5. 5.
    Click OK. The columns that you selected will be replaced by the data from all the columns, as shown in Figure 6-10.
    A370912_2_En_6_Fig10_HTML.jpg
    Figure 6-10.
    The result of merging columns
     
I need to make a few comments about this process:
  • You can select as many columns as you want when merging columns .
  • If you do not give the resulting column a name in the Merge Columns dialog, it will simply be renamed Merged. You can always rename it later if you want.
  • The order in which you select the columns affects the way that the data is merged. So, always begin by selecting the column whose data must appear at the left of the merged column, then the column whose data should be next, and so forth. You do not have to select columns in the order that they initially appeared in the dataset.
  • If you do not want to use any of the standard separators that Power BI Desktop Query suggests, you can always define your own. Just select --Custom-- in the pop-up menu in the Merge Columns dialog. A new box will appear in the dialog, in which you can enter your choice of separator. This can be composed of several characters if you really want.
  • Merging columns from the Transform ribbon removes all the selected columns and replaces them with a single column. The same option is also available from the Add Column ribbon—only in this case, this operation adds a new column and leaves the original columns in the dataset.
Note
This option is also available from the context menu if you right-click a column title.
The available merge separators are described in Table 6-5.
Table 6-5.
Merge Separators
Option
Description
Colon
Uses the colon (:) as the separator.
Comma
Uses the comma (,) as the separator.
Equals Sign
Uses the equals sign (=) as the separator.
Semi-Colon
Uses the semicolon (;) as the separator.
Space
Uses the space ( ) as the separator.
Tab
Uses the tab character as the separator.
Custom
Lets you enter a custom separator.
Tip
You can split, remove, and duplicate columns using the context menu if you prefer. Just remember to right-click the column title to display the correct context menu.

Going to a Specific Column

Power BI Desktop can load datasets that contain hundreds of columns. As scrolling left and right across dozens of columns can be more than a little frustrating, you can always jump to a specific column at any time.
  1. 1.
    In the Home ribbon of the Query Editor, click the small triangle at the bottom of the Choose Columns button. The Go to Column dialog will appear.
     
  2. 2.
    Select the column you want to move to. The dialog will look like Figure 6-11.
    A370912_2_En_6_Fig11_HTML.jpg
    Figure 6-11.
    The Go to Column dialog
     
  3. 3.
    Click OK. Power BI Desktop will select the chosen column.
     
Tip
If you prefer, you can double-click a column name in the Go to Column dialog to move to the chosen column.

Removing Records

You may not always need all the data that you have loaded into a Power BI Desktop query. There could be several possible reasons for this:
  • You are taking a first look at the data and you only need a sample to get an idea of what the data is like.
  • The data contains records that you clearly do not need and that you can easily identify from the start.
  • You are testing data cleansing and you want a smaller dataset to really speed up the development of a complex data extractions and transformation process.
  • You want to analyze a reduced dataset to extrapolate theses and inferences, and to save analysis on a full dataset for later, or even using a more industrial-strength toolset such as SQL Server Integration Services.
To allow you to reduce the size of the dataset, Power BI Desktop proposes two basic approaches out of the box:
  • Keep certain rows
  • Remove certain rows
Inevitably, the technique that you adopt will depend on the circumstances. If it is easier to specify the rows to sample by inclusion, then the keep-certain-rows approach is the best option to take. Inversely, if you want to proceed by exclusion, then the remove-certain-rows technique is best. Let’s look at each of these in turn.

Keeping Rows

This approach lets you specify the rows that you want to continue using. It is based on the application of one of the following three choices:
  • Keep the top n records.
  • Keep the bottom n records.
  • Keep a specified range of records—that is, keep n records every y records.
Most of these techniques are very similar, so let’s start by imagining that you want to keep the top 50 records in the sample C:PowerBiDesktopSamplesCH06CH06Example1.pbix file.
  1. 1.
    In the Home ribbon of the Power BI Query Editor, click the Keep Rows button. The menu will appear.
     
  2. 2.
    Select Keep Top Rows. The Keep Top Rows dialog will appear.
     
  3. 3.
    Enter 50 in the “Number of rows” box, as shown in Figure 6-12.
    A370912_2_En_6_Fig12_HTML.jpg
    Figure 6-12.
    The Keep Top Rows dialog
     
  4. 4.
    Click OK. All but the first 50 records are deleted and Kept First Rows is added to the Applied Steps list.
     
To keep the bottom n rows , the technique is virtually identical. Follow the steps in the previous example, but select Keep Bottom Rows in step 2. In this case, the Applied Steps list displays Kept Last Rows.
To keep a range of records, you need to specify a starting record and the number of records to keep from then on. For instance, suppose that you wish to lose the first 10 records but keep the following 25. This is how to go about it:
  1. 1.
    In the Home ribbon, click the Keep Rows button.
     
  2. 2.
    Select Keep Range of Rows. The Keep Range of Rows dialog will appear.
     
  3. 3.
    Enter 11 in the “First row” box.
     
  4. 4.
    Enter 25 in the “Number of rows” box, as shown in Figure 6-13.
    A370912_2_En_6_Fig13_HTML.jpg
    Figure 6-13.
    The Keep Range of Rows dialog
     
  5. 5.
    Click OK. All but records 1–10 and 36 to the end are deleted and Kept Range Of Rows is added to the Applied Steps list.
     

Removing Rows

Removing rows is a nearly identical process to the one you just used to keep rows. As removing the top or bottom n rows is highly similar, I will not go through it in detail. All you have to do is click the Remove Rows button in the Home ribbon and follow the process as if you were keeping rows. The Applied Steps list will read Removed Top Rows or Removed Bottom Rows in this case, and rows will be removed instead of being kept in the dataset, of course.
The remove rows approach does have one very useful option that can be applied as a sampling technique. It allows you to remove one or more records every few records to produce a subset of the source data. To do this, you need to do the following:
  1. 1.
    Click the Remove Rows button in the Query window Home ribbon. The menu will appear.
     
  2. 2.
    Select Remove Alternate Rows. The Remove Alternate Rows dialog will appear.
     
  3. 3.
    Enter 10 as the First row to remove.
     
  4. 4.
    Enter 2 as the Number of rows to remove.
     
  5. 5.
    Enter 10 as the Number of rows to keep.
    The dialog will look like Figure 6-14.
    A370912_2_En_6_Fig14_HTML.jpg
    Figure 6-14.
    The Remove Alternate Rows dialog
     
  6. 6.
    Click OK. All but the records matching the pattern you entered in the dialog are removed . Removed Alternate Rows is then added to the Applied Steps list.
     
Note
If you are really determined to extract a sample that you consider to be representative of the key data, then you can always filter the data before subsetting it to exclude any outliers. Filtering data is explained later in this chapter.

Removing Blank Rows

If your source data contains completely blank (empty) rows, you can delete these as follows:
  1. 1.
    Click the Remove Rows button in the Query window Home ribbon. The menu will appear.
     
  2. 2.
    Select Remove Blank Rows.
     
This results in empty rows being deleted. Removed Blank Rows is then added to the Applied Steps list.

Removing Duplicate Records

An external source of data might not be quite as perfect as you might hope. One of the most annoying features of poor data is the presence of duplicates. These are insidious since they falsify results and are not always visible. If you suspect that the data table contains strict duplicates (that is, where every field is identical in two or more records), then you can remove the duplicates like this:
  1. 1.
    Click the Remove Duplicates in the popup menu for the table (this is at the top left of the table grid). All duplicate records are deleted and Removed Duplicates is added to the Applied Steps list.
     
Note
I must stress that this approach will only remove completely identical records. If two records have just one different character or a number but everything else is identical, then they are not considered duplicates by the Power BI Desktop Query Editor. Alternatively, if you want to isolate and examine the duplicate records, then you can display only completely identical records by selecting Keep Duplicates from the popup menu for the table.
So if you suspect or are sure that the data table you are dealing with contains duplicates, what are the practical solutions? This can be a real conundrum, but there are some basic techniques that you can apply:
  • Remove all columns that you are sure you will not be using later in the data-handling process. This way, Power BI Desktop will only be asked to compare essential data across potentially duplicate records.
  • Group the data on the core columns (this is explained later in this chapter).
Note
As you have seen, Power BI Desktop Query can help you to home in on the essential elements in a dataset in just a few clicks. If anything, you need to be careful that you are not removing valuable data—and consequently skewing your analysis—when excluding data from the query.

Sorting Data

Although not strictly a data modification step, sorting an imported table will probably be something that you want to do at some stage, if only to get a clearer idea of the data that you are dealing with. Do the following to sort the data:
  1. 1.
    Close the file CH06Example1.pbix (if, indeed, you have been using it to test the techniques explained so far in this chapter) without saving it.
     
  2. 2.
    Open the file C:PowerBiDesktopSamplesCH06CH06Example2.pbix.
     
  3. 3.
    Click Edit Queries to open the Query Editor.
     
  4. 4.
    Click inside the column you wish to sort by.
     
  5. 5.
    Click Sort Ascending (the A/Z icon) or Sort Descending (the Z/A icon) in the Home ribbon.
     
The data is sorted in either alphabetical (smallest to largest) or reverse alphabetical (largest to smallest) order. If you want to carry out a complex sort operation (that is, first by one column and then by another if the first column contains the same element over several rows), you do this simply by sorting the columns one after another. Power BI Desktop Query Editor adds a tiny 1, 2, 3, and so on to the right of the column title to indicate the sort sequence. You can see this in Figure 6-15, where I sorted first on the column Make, and finally on the column Model.
A370912_2_En_6_Fig15_HTML.jpg
Figure 6-15.
Sorting multiple columns
As sorting data is considered part of the data modification process, it also appears in the Applied Steps list as Sorted Rows.
Note
An alternative technique for sorting data is to click the pop-up menu for a column (the downward-facing triangle at the right of a column title) and select Sort Ascending or Sort Descending from the pop-up menu.

Reversing the Row Order

If you find that the data that you are looking at seems upside-down (that is, with the bottom rows at the top and vice versa), you can reverse the row order in a single click, if you want. To do this, do the following:
  • In the Transform ribbon, click the Reverse Rows button.
The entire dataset will be reversed and the bottom row will now be the top row.

Undoing a Sort Operation

If you subsequently decide that you do not want to keep your data sorted, you can undo the sort operation at any time, as follows:
  1. 1.
    Click the sort icon at the right of the name of the column that you used as the basis for the sort operation. The context menu will appear, as you can see in Figure 6-16.
    A370912_2_En_6_Fig16_HTML.jpg
    Figure 6-16.
    Removing a sort operation
     
  2. 2.
    Click Clear Sort.
     
The sort order that you applied will be removed, and the data will revert to its original row order.
Note
If you sorted the dataset on several columns, you can choose either to remove all the sort order that you applied by clicking the first column that you used to sort the data. This will remove the sort order that you applied to all the columns that you used to define the sort criteria. If all you want to do is undo the sort on the final column in a set of columns used to sort the recordset, then you can clear only the sort operation on this column.

Filtering Data

The most frequently used way of limiting a dataset is, in my experience, the use of filters on the table that you have loaded. Now, I realize that you may be coming to Power BI Desktop after years with Excel, or after some time using Power Pivot, and that the filtering techniques that you are about to see probably look much like the ones you have used in those two tools. However, because it is fundamental to include and exclude appropriate records when loading source data, I will thoroughly handle Power BI Desktop filters, even if this means that certain readers will experience a strong sense of déjà vu.
Here are two basic approaches for filtering data in Power BI Desktop:
  • Select one or more specific values from the unique list of elements in the chosen column.
  • Define a range of data to include or exclude.
The first option is common to all data types, whether they are text, number, or data/time. The second approach varies according to the data type of the column that you are using to filter data.

Selecting Specific Values

Selecting one or more values present in a column of data is as easy as this (assuming that you are still using the Power BI Desktop file CH06Example2.pbix and are in the Query Editor):
  1. 1.
    Click a column’s pop-up menu. (I used Make in the sample dataset in this example.) The filter menu appears.
     
  2. 2.
    Check all elements that you want to retain and uncheck all elements that you wish to exclude. In this example, I kept Bentley and Rolls-Royce, as shown in Figure 6-17.
    A370912_2_En_6_Fig17_HTML.jpg
    Figure 6-17.
    A filter menu
     
  3. 3.
    Click OK. The Applied Steps list adds Filtered Rows.
     
Note
You can deselect all items by clicking the (Select All) check box; reselect all the items by selecting this box again. It follows that, if you want to keep only a few elements, it may be faster to unselect all of them first and then only select the ones that you want to keep. If you want to exclude any records without a value in the column that you are filtering on, then select Remove Empty from the filter menu.

Finding Elements in the Filter List

Scrolling up and down in a filter list can get extremely laborious. A fast way of limiting the list to a subset of available elements is to do the following:
  1. 1.
    Click the pop-up menu for a column. (I use Model in the sample dataset in this example.) The filter menu appears.
     
  2. 2.
    Enter a letter or a few letters in the Search box. The list shortens with every letter or number that you enter. If you enter ar, then the filter popup will look like Figure 6-18.
     
A370912_2_En_6_Fig18_HTML.jpg
Figure 6-18.
Searching the filter menu
To remove a filter, all that you have to do is click the cross that appears at the right of the Search box.

Filtering Text Ranges

If a column contains text, then you can apply specific options to filter the data. These elements are found in the filter popup of any text-based column in the Text Filters submenu. The choices are given in Table 6-6.
Table 6-6.
Text Filter Options
Filter Option
Description
Equals
Sets the text that must match the cell contents.
Does Not Equal
Sets the text that must not match the cell contents.
Begins With
Sets the text at the left of the cell contents.
Does Not Begin With
Sets the text that must not appear at the left of the cell contents.
Ends With
Sets the text at the right of the cell contents.
Does Not End With
Sets the text that must not appear at the right of the cell contents.
Contains
Lets you enter a text that will be part of the cell contents.
Does Not Contain
Lets you enter a text that will not be part of the cell contents.

Filtering Numeric Ranges

If a column contains numbers, then there are also specific options that you can apply to filter the data. You’ll find these elements in the filter popup of any text-based column in the Number Filters submenu. The choices are given in Table 6-7.
Table 6-7.
Numeric Filter Options
Filter Option
Description
Equals
Sets the number that must match the cell contents.
Does Not Equal
Sets the number that must not match the cell contents.
Greater Than
Cell contents must be greater than this number.
Greater Than Or Equal To
Cell contents must be greater than or equal to this number.
Lesser Than
Cell contents must be less than this number.
Lesser Than Or Equal To
Cell contents must be less than or equal to this number.
Between
Cell contents must be between the two numbers.

Filtering Date and Time Ranges

If a column contains dates or times (or both), then specific options can also be applied to filter the data. These elements are found in the filter popup of any text-based column in the Date/Time Filters submenu. The choices are given in Table 6-8.
Table 6-8.
Date and Time Filter Options
Filter Element
Description
Equals
Filters data to include only records for the selected date.
Before
Filters data to include only records up to the selected date.
After
Filters data to include only records after the selected date.
Between
Lets you set an upper and a lower date limit to exclude records outside that range.
In the Next
Lets you specify a number of days, weeks, months, quarters, or years to come.
In the Previous
Lets you specify a number of days, weeks, months, quarters, or years up to the date.
Is Earliest
Filters data to include only records for the earliest date.
Is Latest
Filters data to include only records for the latest date.
Is Not Earliest
Filters data to include only records for dates not including the earliest date.
Is Not Latest
Filters data to include only records for dates not including the latest date.
Day ➤ Tomorrow
Filters data to include only records for the day after the current system date.
Day ➤ Today
Filters data to include only records for the current system date.
Day ➤ Yesterday
Filters data to include only records for the day before the current system date.
Week ➤ Next Week
Filters data to include only records for the next calendar week.
Week ➤ This Week
Filters data to include only records for the current calendar week.
Week ➤ Last Week
Filters data to include only records for the previous calendar week.
Month ➤ Next Month
Filters data to include only records for the next calendar month.
Month ➤ This Month
Filters data to include only records for the current calendar month.
Month ➤ Last Month
Filters data to include only records for the previous calendar month.
Month ➤ Month Name
Filters data to include only records for the specified calendar month.
Quarter ➤ Next Quarter
Filters data to include only records for the next quarter.
Quarter ➤ This Quarter
Filters data to include only records for the current quarter.
Quarter ➤ Last Quarter
Filters data to include only records for the previous quarter.
Quarter ➤ Quarter Name
Filters data to include only records for the specified quarter.
Year ➤ Next Year
Filters data to include only records for the next year.
Year ➤ This Year
Filters data to include only records for the current year.
Year ➤ Last Year
Filters data to include only records for the previous year.
Year ➤ Year To Date
Filters data to include only records for the calendar year to date.
Custom Filter
Lets you set up a specific filter for two possible date ranges.

Filtering Data

Filtering data uses a very similar approach, whatever the type of filter that is applied. As a simple example, here is how to apply a number filter to the sale price to find vehicles that sold for less than £5,000.00:
  1. 1.
    Click the popup menu for the SalePrice column.
     
  2. 2.
    Click Number Filters. The submenu will appear.
     
  3. 3.
    Select Less Than. The Filter Rows dialog will be displayed.
     
  4. 4.
    Enter 5000 in the box next to the “is less than” box, as shown in Figure 6-19.
    A370912_2_En_6_Fig19_HTML.jpg
    Figure 6-19.
    The Filter Rows dialog
     
  5. 5.
    Click OK. The dataset only displays rows that conform to the filter that you have defined.
     
Although extremely simple to apply, filters do require a few comments:
  • You can combine up to two elements in a basic filter. These can be mutually inclusive (an AND filter) or they can be an alternative (an OR filter).
  • You can combine several elements in an advanced filter—as you will learn in the next section.
  • You should not apply any formatting when entering numbers.
  • Any text that you filter on is not case-sensitive.
  • If you choose the wrong filter, you do not have to cancel and start over. Simply select the correct filter type from the popup in the left-hand boxes in the Filter Rows dialog.
Tip
If you set a filter value that excludes all the records in the table, Power BI Desktop displays an empty table except for the words “This table is empty.” You can always remove the filter by clicking the cross to the left of Filtered Rows in the Applied Steps list. This will remove the step and revert the data to its previous state.

Applying Advanced Filters

Should you ever need to be extremely specific when filtering data, you can always use Power BI Desktop’s advanced filters. These let you extend the filter elements so that you can include or exclude records to a high level of detail. Here is the procedure:
  1. 1.
    Click the popup menu for the SalePrice column.
     
  2. 2.
    Click Number Filters. The submenu will appear.
     
  3. 3.
    Select Equals. The Filter Rows dialog will be displayed.
     
  4. 4.
    Click Advanced.
     
  5. 5.
    Enter 5000 as the value for the first filter element in the dialog.
     
  6. 6.
    Select Or from the popup as the filter type for the second filter element.
     
  7. 7.
    Select Equals as the operator.
     
  8. 8.
    Enter 89000 as the value for the second filter in the dialog.
     
  9. 9.
    Click Add clause. A new filter element will be added to the dialog under the existing elements.
     
  10. 10.
    Select Equals as the operator.
     
  11. 11.
    Enter 178500 as the value for the third filter element in the dialog. The Filter dialog will look like the one shown in Figure 6-20.
    A370912_2_En_6_Fig20_HTML.jpg
    Figure 6-20.
    Advanced filters
     
  12. 12.
    Click OK. Only records containing the figures that you entered in the Filter dialog will be displayed in the Power BI Desktop Query Editor.
     
I would like to finish on the subject of filters with a few comments:
  • In the Advanced Filter dialog you can “mix and match” columns and operators to achieve the filter result that you are looking for.
  • You can also order the sequence of filters if you ever need to. To do this, simply click inside a filter row and it will appear with a gray background (like the third filter in Figure 6-20). Then click the ellipses at the right of the filter row and select Move Up or Move Down from the popup menu.
  • To delete a filter, click the ellipses at the right of the filter row and select Delete.
Note
When you are dealing with really large datasets, you may find that a filter does not always show all the available values from the source data. This is because Power BI Query Editor has loaded only a sample subset of the data. In cases like these you will see an alert in the filter popup menu and a “Load more” link. Clicking this link will force Power BI to reload a larger sample set of data. However, memory restrictions may prevent it loading all the data that you need. In cases like this you should consider modifying the source query if possible so that it brings back a representative dataset that can fit into memory.

Grouping Records

At times, you will need to transform your original data in an extreme way—by grouping the data. This is very different from filtering data, removing duplicates, or cleansing the contents of columns. When you group data, you are altering the structure of the dataset to “roll up” records where you do the following:
  • Define the attribute columns that will become the unique elements in the grouped data table
  • Specify which aggregations are applied to any numeric columns included in the grouped table
Grouping is frequently an extremely selective operation. This is inevitable, since the more attribute (that is, non-numeric) columns you choose to group on, the more records you are likely to include in the grouped table. However, this will always depend on the particular dataset you are dealing with, and grouping data efficiently is always a matter of flair, practice, and good, old-fashioned trial and error.

Simple Groups

To understand how grouping works—and how it can radically alter the structure of your dataset—let’s see a simple example of row grouping in action:
  1. 1.
    In the sample file CH06Example1.pbix, click inside the Make column.
     
  2. 2.
    In either the Home ribbon or Transform ribbon, click the Group By button. The Group By dialog will appear, looking like the one in Figure 6-21.
    A370912_2_En_6_Fig21_HTML.jpg
    Figure 6-21.
    Simple grouping
     
  3. 3.
    Click OK. The dataset will now only contain the list of makes of vehicle and the number of records for each make. You can see this in Figure 6-22.
    A370912_2_En_6_Fig22_HTML.jpg
    Figure 6-22.
    Simple grouping output
     
Power BI Desktop will add a step named Grouped Rows to the Applied Steps list when you apply grouping to a dataset.
Note
The best way to cancel a grouping operation is to delete the Grouped Rows step in the Applied Steps list.
Although Power BI Desktop defaults to counting rows, there are several other operations that you can apply when grouping data. These are outlined in Table 6-9.
Table 6-9.
Aggregation Operations when Grouping
Aggregation Operation
Description
Count Rows
Counts the number of records.
Count Distinct Rows
Counts the number of unique records.
Sum
Returns the total for a numeric column.
Average
Returns the average for a numeric column.
Median
Returns the median value of a numeric column.
Min
Returns the minimum value of a numeric column.
Max
Returns the maximum value of a numeric column.
All Rows
Creates a table of records for each grouped element.

Complex Groups

Power BI Desktop can help you shape your datasets in more advanced ways by creating more complex data groupings. As an example, you could try out the following to group by make and model and add columns showing the total sales value and the average cost:
  1. 1.
    Open the sample file
    C:PowerBiDesktopSamplesCH06CH06Example1.pbix.
     
  2. 2.
    Click Edit Queries to open the Query Editor.
     
  3. 3.
    Select the following columns (by Ctrl-clicking the column headers):
     
  4. a.
    Make
     
  5. b.
    Model
     
  6. 4.
    In either the Home ribbon or Transform ribbon, click Group By.
     
  7. 5.
    In the New Column Name box, enter TotalSales.
     
  8. 6.
    Select Sum as the operation.
     
  9. 7.
    Choose SalePrice as the source column in the Column popup list.
     
  10. 8.
    Click the Add Aggregation button and repeat the operation, only this time, use the following:
    1. a.
      New Column Name: AverageCost
       
    2. b.
      Operation: Average
       
    3. c.
      Column: CostPrice
       
    The Group By dialog should look like the one in Figure 6-23.
    A370912_2_En_6_Fig23_HTML.jpg
    Figure 6-23.
    The Group By dialog
     
  11. 9.
    Click OK. All columns, other than those that you specified in the Group By dialog, are removed, and the table is grouped and aggregated, as shown in Figure 6-24. Grouped Rows will be added to the Applied Steps list. I have also sorted the table by the Make and Model columns to make the grouping easier to comprehend.
    A370912_2_En_6_Fig24_HTML.jpg
    Figure 6-24.
    Grouping a dataset
     
If you have created a really complex group and then realized that you need to change the order of the columns, all is not lost. You can alter the order of the columns in the output by clicking the ellipses to the right of each column definition in the Group By dialog and selecting Move Up or Move Down. The order of the columns in the Group By dialog will be the order of the columns (left to right) in the resulting dataset.
Note
You do not have to Ctrl-click to select the grouping columns. You can add them one by one to the Group By dialog by clicking the Add Aggregation button. Equally, you can remove grouping columns (or added and aggregated columns) by clicking the ellipses to the right of a column name and selecting Delete from the popup menu.

Saving Changes in the Query Editor

As you would expect, you can save any changes that you have made when using the Power BI Desktop Query Editor at any time. However, you need to be aware that when you click the Save icon above the ribbon (or if you click File ➤ Save), you will be presented with a choice in the dialog that you can see in Figure 6-25.
A370912_2_En_6_Fig25_HTML.jpg
Figure 6-25.
Applying pending changes before saving in the Query Editor
At this point you have to decide if you want only to save the work that you have done using the Query Editor, or if you want not only to save your work, but also update the data in the data model with the latest version of the data that results from the data transformation that you have carried out.
Consequently, you have to choose between:
  • Apply: Apply the changes and load the data to the data model.
  • Apply Later: Save your modifications but leave the data as it is currently.

Exiting the Query Editor

In a similar vein to the Save options just described, you can choose how to exit the Query Editor and return to your reports and dashboards in Power BI Desktop. The default option (when you click the Close & Apply button) is to apply all the changes that you have made to the data, update the data model with the new data, and return to Power BI Desktop.
However, you have two other options that may prove useful. These appear in the menu for the Close & Apply button:
  • Apply: Apply the changes that you have made to the data model. This may involve changes to the fields in the Fields list.
  • Close: Close the Query Editor but do not apply any changes.
If you choose not to apply the changes that you have made, then you will see the alert that is displayed in Figure 6-26. At some point you will have to update the data model to ensure that you are using the correct data for your reports by clicking the Apply Changes button in the alert that appears above the report canvas.
A370912_2_En_6_Fig26_HTML.jpg
Figure 6-26.
The pending changes alert

Conclusion

This chapter started you on the road to transforming datasets with Power BI Desktop. You saw how to trim datasets by removing rows and columns. You also learned how to subset a sample of data from a data source by selecting alternating groups of rows.
You also saw how to choose the columns that you want to use in reports, how to move columns around in the dataset, and how to rename columns so that your data is easily comprehensible when you use it later in dashboards and reports. Then, you saw how to filter and sort data, as well as how to remove duplicates to ensure that your dataset only contains the precise rows that you need for your upcoming visualizations. Finally, you learned how to group and aggregate data.
It has to be admitted, nonetheless, that preparing raw data for use in dashboards and reports is not always easy and can take a while to get right. However, Power BI Desktop Query can make this task really easy with a little practice. So now that you have grasped the basics, it is time to move on and discover some further data transformation techniques. Specifically, you will see how to transform and potentially cleanse the data that you have imported. This is the subject of the next chapter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.173.78