©  Adam Aspin 2018
Adam AspinPro Power BI Desktophttps://doi.org/10.1007/978-1-4842-3210-1_7

7. Data Transformation

Adam Aspin
(1)
Stoke-on-Trent, Staffordshire, UK
 
Once a dataset has been shaped and filtered (as covered in the previous chapter), it probably still needs a good few modifications to make it ready for consumption. Many of these modifications are, at their heart, a selection of fairly simple yet necessary techniques that you apply to make the data cleaner and more standardized. I have chosen to group these approaches under the heading data transformation .
The sort of things that you may be looking to do before finally loading source data into the data model normally cover a range of processes that cleanse the data. They can include the following:
  • Change the data type for a column—by telling Power BI Desktop that the column contains numbers, for example
  • Ensure that the first row is used as headers (if this is required)
  • Remove part of a column’s contents
  • Replace the values in a cell with other values
  • Transform the column contents—by making the text uppercase, for instance, or by removing decimals from numbers
  • Fill data down or up over empty cells to ensure that records are complete
  • Apply math or statistical (or even trigonometric) functions to columns of numbers
  • Convert date or time data into date elements such as days, months, quarters, years, hours, or minutes
Transforming data does not only consist of reducing it. Sometimes you may have to extend the data to make it useable. This normally means adding further columns to a data table. The techniques to do this include
  • Duplicating column and possibly altering the format of the data in the copied column
  • Extracting part of the data in a column into a new column
  • Separating all the data in a column so that each data element appears in a separate column
  • Merging columns into a new column
  • Adding custom columns that possibly contain calculations or extract part of a column’s data into a new column, or even concatenate columns
  • Adding “index” columns to ensure uniqueness or memorize a sort order
This chapter will take you on a tour of these kinds of essential data transformations . Once you have finished reading it, you should be confident that you can take a rough and ready data source as a starting point and convert it into a polished and coherent data table that is ready to become a pivotal part of your Power BI Desktop data model. Not only that, but you will have carried out really heavy lifting much faster and more easily than you could have done using enterprise-level tools.
The sample data that you will need to follow the exercises in this chapter is in the folder C:PowerBiDesktopSamplesCH07.

Viewing a Full Record

Before even starting to cleanse data, you probably need to take a good look at it. While the Power BI Desktop Query Editor is great for scrolling up and down columns to see how data compares for a single field, it is often less easy to appreciate the entire contents of a single record.
So to avoid having to scroll frenetically left and right across rows of data, the Query Editor has another brilliantly simple solution. If you click a row (or more specifically, on the number of a row in the grid on the left), the Power BI Desktop Query Editor will display the contents of an entire record in a single window under the dataset . You can see an example of this in Figure 7-1.
A370912_2_En_7_Fig1_HTML.jpg
Figure 7-1.
Viewing a full record
Note
You can alter the relative height of the recordset and dataset windows simply by dragging the gray separator line between the upper and lower windows up or down.

Power BI Desktop Query Editor Context Menus

As is normal for Windows programs, Power BI Desktop Query Editor makes full use of context (or “right-click”) menus as an alternative to using the ribbons. When transforming datasets , there are three main context menus that you will probably find yourself using:
  • Table menu : This menu appears when you right-click the top corner of the grid containing the data.
  • Column menu : This menu appears when you right-click a column title.
  • Cell menu : This menu appears when you right-click a data cell.
While I have referred copiously to the context menus when explaining how to transform data, it is probably easier to take a quick look at them now so that you can see the various options. Figure 7-2 gives you a quick overview of these three context menus .
A370912_2_En_7_Fig2_HTML.jpg
Figure 7-2.
The Power BI Desktop Query Editor context menus
Because the options that are available in the context menus are explained throughout this, the previous, and the following chapters, I will not explain them all in detail here.

Using the First Row As Headers

Power BI Desktop is very good at guessing if it needs to take the first record of a source dataset and have it function as the column headers. This is fundamental for two reasons:
  • You avoid leaving the columns named Column1, Column2, and so on. Leaving them named generically like this would make it needlessly difficult for a user (or even yourself) to understand the data.
  • You avoid having a text element (which should be the column title) in a column of figures, which can cause problems later on. This is because a whole column needs to have the same data type for another data type to be applied. Having a header text in the first row prevents this for numeric and data/time data types, for instance. This could be because the header is a text whereas the remainder of the column contains numbers or dates.
Yet there could be—albeit rare—occasions when Power BI Desktop guesses incorrectly and assumes that the first record in a dataset is data when it is really the header information. So instead of headers, you have a set of generic column titles such as Column1, Column2, and so forth. Fortunately, correcting this and using the first row as headers is a simple task:
  • Click Use First Row As Headers in the Transform ribbon of the Power BI Desktop Query window.
After a few seconds, the first record disappears and the column titles become the elements that were in the first record. The Applied Steps list on the right now contains a Promoted Headers element, indicating which process has taken place. This step is highlighted.
Note
Power BI Desktop is often able to apply this step automatically when the source is a database. It can often correctly guess when the source is a file. However, it cannot always guess accurately, so sometimes you have to intervene. You can see if Power BI Desktop has had to guess this if it has added a Promoted Headers step to the Applied Steps list.
In the rare event that Power BI Desktop gets this operation wrong and presumes that a first row is column titles when it is not, you can reset the titles to be the first row by clicking the tiny triangle to the right of the Use First Row As Headers button. This displays a short menu where you can click the Use Headers As First Row option. The Applied Steps list on the right now contains a Demoted Headers element and the column titles are Column1, Column2, and so forth. You can subsequently rename the columns as you see fit.

Changing Data Type

A truly fundamental aspect of data modification is ensuring that the data is of the appropriate type; that is, if you have a column of numbers that are to be calculated at some point, then the column should be a numeric column. If it contains dates, then it should be set to one of the date or time data types. I realize that this can seem arduous and even superfluous; however, if you want to be sure that your data can be sliced and diced correctly further down the line, then setting the right data types at the outset is vital. An added bonus is that if you validate the data types early on in the process of loading data, you can see from the start if the data has any potential issues—dates that cannot be read as dates, for instance. This allows you to decide what to do with poor or unreliable data early in your work with a dataset .
The good news here is that for many data sources, Power BI Desktop applies an appropriate data type. Specifically, if you have loaded data from a database, then Power BI Desktop will recognize the data type for each column and apply a suitable native data type. Unfortunately, things can get a little more painful with file sources, specifically CSV, text, and (occasionally) Excel files, as well as some XML files. In the case of these file types, Power BI Desktop often tries to guess the data type, but there are times when it does not succeed. If it has made a stab at deducing data types, then you see a Changed Type step in the Applied Steps list. Consequently, if you are obtaining your data from these sources, then you could well be obliged to apply data types to many of the columns manually.
Note
In some cases, numbers are not meant to be interpreted as numerical data. For instance, a French postal code is five numbers, but it will never be calculated in any way. So it is good practice to let Power BI Desktop know this by changing the data type to Text.
Do the following to change data type for a column or a group of columns:
  1. 1.
    Open the sample file C:PowerBiDesktopSamplesCH07CH07Example1.pbix.
     
  2. 2.
    Click the Edit Queries button in the Home ribbon . The Query Editor will open.
     
  3. 3.
    Click inside the column whose data type you wish to change. If you want to modify several columns, then Ctrl-click the requisite column titles. In this example, you could select the CostPrice and TotalDiscount columns.
     
  4. 4.
    Click the Data Type button in the Transform ribbon . A pop-up menu of potential data types will appear.
     
  5. 5.
    Select an appropriate data type. If you have selected the CostPrice and TotalDiscount columns, then Whole Number is the type to choose.
     
After a few seconds, the data type will be applied. Changed Type will appear in the Applied Steps list. The data types that you can apply are outlined in Table 7-1.
Table 7-1.
Data Types in Power BI Desktop
Data Type
Description
Decimal Number
Converts the data to a decimal number.
Fixed Decimal Number
Converts the data to a decimal number with a fixed number of decimals.
Whole Number
Converts the data to a whole (integer) number.
Date/Time
Converts to a date and time data type.
Date
Converts to a date data type.
Time
Converts to a time data type.
Duration
Sets the data as being a duration. These are used for date and time calculations.
Text
Sets to a text data type.
True/False
Sets the data type to Boolean (True or False).
Binary
Defines the data as binary, and consequently, it is not directly visible.
Note
The Data Type button is also available in the Home ribbon . Equally, you can right-click a column header and select Change Type to select a different data type.
Inevitably, there will be times when you try to apply a data type that simply cannot be used with a certain column of data. Converting a text column (such as Make in this sample data table) into dates will simply not work. If you do this, then Power BI Desktop will replace the column contents with Error. This is not definitive or dangerous, and all you have to do to return the data to its previous state is to delete the Changed Type step in the Applied Steps list using the technique described in the previous chapter.
Sometimes you could try and change a data type when the data type has already been changed. In this case you will get an alert like the one shown in Figure 7-3.
A370912_2_En_7_Fig3_HTML.jpg
Figure 7-3.
The Change Column Type alert
If this occurs, you can do one of two things:
  • Let Power BI Desktop update the existing conversion step with the data type that you just selected.
  • Add a new conversion step.
Your choice will depend on exactly what type of transformation you are applying to the underlying dataset.
It can help to alter data types at the same time for a set of columns where you think that this operation is necessary. There are a couple of good reasons for this approach:
  • You can concentrate on getting data types right, and if you are working methodically, you are less likely to forget to set a data type.
  • Applying data types for many columns (even if you are doing this in several operations, to single or multiple columns) will only add a single step to the Applied Steps list.
Note
Don’t look for any data formatting options in Power BI Desktop Query; there aren’t any. This is deliberate since this tool is designed to structure, load, and cleanse data, but not to present it. You carry out the formatting in the Power BI Desktop Data View, as you will see in Chapter 10.

Detecting Data Types

Applying the correct data type to dozens of columns can be more than a little time-consuming. Fortunately, Power BI Desktop now contains an option to apply data types automatically to a whole table:
  1. 1.
    In the Transform ribbon , click the Detect Data Type button.
     
  2. 2.
    Changed Type will appear in the Applied Steps list. Most of the columns will have the correct data type applied.
     
This technique does not always give perfect results, and there will be times when you want to override the choice of data type that Power BI Desktop has applied. Yet it is nonetheless a welcome addition to the data preparation toolset that can save you considerable time when preparing a dataset.

Data Type Indicators

It would be singularly unproductive to have to guess which column was set to which data type. So Power BI Desktop comes to your aid by indicating, visually, the corresponding data type for each column. If you look closely to the left of each individual column header, you will see a tiny icon. Each icon specifies the column’s data type. The meaning of each icon is given in Table 7-2.
Table 7-2.
Data Type Icons in Power BI Desktop Query Editor
Data Type Icon
Description
A370912_2_En_7_Figa_HTML.jpg
Any data type from among the possible data types
A370912_2_En_7_Figb_HTML.jpg
Whole Number
A370912_2_En_7_Figc_HTML.jpg
Decimal Number
A370912_2_En_7_Figd_HTML.jpg
Fixed Decimal Number
A370912_2_En_7_Fige_HTML.jpg
Percentage
A370912_2_En_7_Figf_HTML.jpg
Text
A370912_2_En_7_Figg_HTML.jpg
True/False
A370912_2_En_7_Figh_HTML.jpg
Date/Time
A370912_2_En_7_Figi_HTML.jpg
Date
A370912_2_En_7_Figj_HTML.jpg
Time
A370912_2_En_7_Figk_HTML.jpg
Date/Time/Timezone
A370912_2_En_7_Figl_HTML.jpg
Duration
A370912_2_En_7_Figm_HTML.jpg
Binary

Switching Data Types

Another quick way to alter the data type for a column is to click the data type icon to the left of the column title and select the required data type from the context menu that you can see in Figure 7-4.
A370912_2_En_7_Fig4_HTML.jpg
Figure 7-4.
The data type context menu

Data Type Using Locale

When you are converting data types, you can also choose to use the current locale to specify date, time, and number formats. This means that users opening the Power BI Desktop file in another country will see date, time, and number formats adapted to the local formatting conventions. To do this:
  1. 1.
    Open the Query Editor (unless it is already open).
     
  2. 2.
    Click the data type icon to the left of the column title.
     
  3. 3.
    Select Using Locale from the popup menu. The Change Type with Locale dialog will appear.
     
  4. 4.
    Choose the new data type to apply from the list of available data types.
     
  5. 5.
    Select the required locale from the list of worldwide locales. The dialog will look like Figure 7-5.
    A370912_2_En_7_Fig5_HTML.jpg
    Figure 7-5.
    The Change Type with Locale dialog
     
  6. 6.
    Click OK.
     
The data type will be converted to the selected locale. The Applied Steps list will contain a step entitled Changed Type with Locale.

Replacing Values

Some data that you load will need certain values to be replaced by others in a kind of global search-and-replace operation—just as you would in a document. For instance, perhaps you need to standardize spellings where a make of car (to use the current sample dataset as an example) has been entered incorrectly. To carry out this particular data cleansing operation, do the following:
  1. 1.
    Click the title of the column that contains the data that you want to replace. The column will become selected. In this example, I used the Model column as an example.
     
  2. 2.
    In the Home ribbon , click the Replace Values button. The Replace Values dialog will appear.
     
  3. 3.
    In the Value To Find box, enter the text or number that you want to replace. I used Ghost in this example.
     
  4. 4.
    In the Replace With box, enter the text or number that you want to replace. I used Fantôme in this example, as shown in Figure 7-6.
    A370912_2_En_7_Fig6_HTML.jpg
    Figure 7-6.
    The Replace Values dialog
     
  5. 5.
    Click OK. The data is replaced in the entire column. Replaced Values is added to the Applied Steps list.
     
I only have a few comments about this technique:
  • The Replace Values process searches for every occurrence of the text that you are looking for in each record of the selected columns. It does not look for the entire contents of the cell unless you specifically request this by checking the Match Entire Cell Contents check box in the advanced options.
  • If you click a cell containing the contents that you want to replace (rather than the column title, as we just did), before starting the process, Power BI Desktop automatically places the cell contents in the Replace Values dialog as the value to find.
  • You can only replace text in columns that contain text elements. This does not work with columns that are set as a numeric or date data type.
  • If you really have to replace parts of a date or figures in a numeric column with other dates or numbers, then you can
    • Convert the column to a text data type
    • Carry out the replace operation
    • Convert the column back to the original data type
The Replace Values dialog also has a few advanced options that you can apply. You can see these if you expand the “Advanced options” item by clicking the triangle to its left. These options are explained in Table 7-3.
Table 7-3.
Advanced Replace Options
Option
Description
Match Entire Cell Contents
Only replaces the search value if it makes up the entire contents of the column for a row.
Replace Using Special Characters
Replaces the search value with a nonprinting character.
Tab
Replaces the search value with a tab character.
Carriage Return
Replaces the search value with a carriage return character.
Line Feed
Replaces the search value with a line feed character.
Carriage Return and Line Feed
Replaces the search value with a carriage return and line feed.
Note
Replacing words that are subsets of other words are dangerous. When replacing any data, make sure that you don’t damage elements other than the one you intend to change.
As a final and purely spurious comment, I must add that I would never suggest rebranding a Rolls-Royce, as it would be close to automotive sacrilege .

Transforming Column Contents

Power BI Desktop has a powerful toolbox of automated data transformations that allow you to standardize the contents of a column in several ways. These include
  • Setting the capitalization of text columns
  • Rounding numeric data or applying math functions
  • Extracting date elements such as the year, month, or day (among others) from a date column
Power BI Desktop is very strict about applying transformations to appropriate types of data. This is because transforms are totally dependent on the data type of the selected column. This is yet another confirmation that applying the requisite data type is an operation that should be carried out early in any data transformation process—and certainly before transforming the column contents. Remember, you will only be able to select a numeric transformation if the column is a numeric data type, and you will only be able to select a date transformation if the column is a date data type. Equally, the text-based transformations can only be applied to columns that are of the text data type.

Text Transformation

Let’s look at a simple transformation operation in action. As an example, I will get Power BI Desktop to convert the Make column into uppercase characters.
  1. 1.
    Still using the file CH07Example1.pbix, click anywhere in the column whose contents you wish to transform (Make, in this case).
     
  2. 2.
    In the Transform ribbon , click the Format button. A popup menu will appear.
     
  3. 3.
    Select UPPERCASE, as shown in Figure 7-7.
     
A370912_2_En_7_Fig7_HTML.jpg
Figure 7-7.
The Format menu
The contents of the entire column will be converted to uppercase. Uppercased Text will be added to the Applied Steps list.
As you can see from the menu for the Format button, you have five possible options when formatting (or transforming) text. These options are explained in Table 7-4.
Table 7-4.
Text Transformations
Transformation
Description
Applied Steps Definition
Lowercase
Converts all the text to lowercase.
Lowercased Text
Uppercase
Converts all the text to uppercase.
Uppercased Text
Capitalize Each Word
Converts the first letter of each word to a capital.
Capitalized Each Word
Trim
Removes all spaces before and after the text.
Trimmed Text
Clean
Removes any nonprintable characters.
Cleaned Text
Add Prefix
Adds text at the start of the column contents.
Added Prefix
Add Suffix
Adds text at the end of the column contents.
Added Suffix
Note
I realize that Power BI Desktop Query calls text transformations Formatting. Nonetheless, these options are part of the overall data transformation options.

Adding a Prefix or a Suffix

You can also add a prefix or a suffix to all the data in a column. This is as easy as:
  1. 1.
    Click inside the column where you want to add a prefix.
     
  2. 2.
    In the Transform ribbon , select Format ➤ Add Prefix. The Prefix dialog will be displayed, as you can see in Figure 7-8.
    A370912_2_En_7_Fig8_HTML.jpg
    Figure 7-8.
    Adding a prefix to a text
     
  3. 3.
    Enter the prefix to add in the Value field.
     
  4. 4.
    Click OK.
     
The prefix that you designated will be placed at the start of every record in the dataset for the selected field.
Note
If you add a prefix or a suffix to a numeric or date/time column, then the column data type will automatically be converted to text.

Removing Leading and Trailing Spaces

There will inevitably be occasions when you inherit data that has extra spaces before, after, or before and after the data itself. This can be insidious, as it can cause
  • Data duplication, because a value with a trailing space is not considered identical to the same text without the spaces that follow
  • Sort issues, because a leading space causes an element to appear at the top of a sorted list
  • Grouping errors, because elements with spaces are not part of the same group as elements without spaces
Fortunately, Power BI Desktop Query has a ruthlessly efficient solution to this problem.
  1. 1.
    Click anywhere in the column whose contents you wish to transform (Make, in this case).
     
  2. 2.
    In the Transform ribbon , click the Format button. A popup menu will appear.
     
  3. 3.
    Select Trim from the menu.
     
All superfluous leading and trailing spaces will be removed from the data in the column. This should help with sorting, grouping, and deduplicating records .

Removing Non-Printing Characters

Some source data can contain somewhat insidious elements called non-printing characters . These can, even if they are nearly always invisible to humans, cause problems when you print reports and dashboards.
If you suspect that your source data contains non-printing characters, you can remove them simply like this:
  1. 1.
    Click inside the column (or select the columns) that you know to contain (or that you suspect contain) non-printing characters.
     
  2. 2.
    Click Format ➤ Clean.
     
Power BI Desktop will add Cleaned Text to the list of Applied Steps.

Number Transformations

Just as you can transform the contents of text-based columns, you can also apply transformations to numeric values. As an example, suppose that you want to round up all the figures in a column to the nearest whole number.
  1. 1.
    Click anywhere in the column whose contents you wish to transform (TotalDiscount, in this case).
     
  2. 2.
    In the Transform ribbon , click the Rounding button. A popup menu will appear.
     
  3. 3.
    Select Round Up.
     
The values in the entire column will be rounded up to the nearest whole number. Rounded Up will be added to the Applied Steps list.
The other possible numeric transformations that are available are described in Table 7-5. Because these numeric transformations use several buttons in the Transform ribbon , I have indicated which button to use to get the desired result.
Table 7-5.
Number Transformations
Transformation
Description
Applied Steps Definition
Rounding ➤ Round Up
Rounds each number to the specified number of decimal places.
Rounded Up
Rounding ➤ Round Down
Rounds each number down.
Rounded Down
Round…
Rounds each number to the number of decimals that you specify. If you specify a negative number, you round to a given decimal.
Rounded Off
Scientific ➤ Absolute Value
Makes the number absolute (positive).
 
Scientific ➤ Power ➤ Square
Returns the square of the number in each cell.
Calculated Square
Scientific ➤ Power ➤ Cube
Returns the cube of the number in each cell.
Calculated Cube Value
Scientific ➤ Power ➤ Power
Raises each number to the power that you specify.
Calculated Power
Scientific ➤ Square Root
Returns the square root of the number in each cell.
Square Root
Scientific ➤ Exponent
Returns the exponent of the number in each cell.
Calculated Exponent
Scientific ➤ Logarithm ➤ Base 10
Returns the base 10 logarithm of the number in each cell.
Calculated Base 10 Logarithm
Scientific ➤ Logarithm ➤ Natural
Returns the natural logarithm of the number in each cell.
Calculated Natural Logarithm
Scientific ➤ Factorial
Gives the factorial of numbers in the column.
Calculated Factorial
Trigonometry ➤ Sine
Gives the sine of the numbers in the column.
Calculated Sine
Trigonometry ➤ Cosine
Gives the cosine of the numbers in the column.
Calculated Cosine
Trigonometry ➤Tangent
Gives the tangent of the numbers in the column.
Calculated Tangent
Trigonometry ➤ ArcSine
Gives the arcsine of the numbers in the column.
Calculated ArcSine
Trigonometry ➤ ArcCosine
Gives the arccosine of the numbers in the column.
Calculated ArcCosine
Trigonometry ➤ ArcTangent
Gives the arctangent of the numbers in the column .
Calculated ArcTangent
Note
Power BI Desktop Query will not even let you try to apply numeric transformation to texts or dates. The relevant buttons remain grayed out if you click inside a column of letters or dates.

Calculating Numbers

Power BI Desktop Query can also apply simple arithmetic to the figures in a column. Suppose, for instance, that you want to multiply all the sale prices by 110% as part of your forecasts. This is how you can do just that:
  1. 1.
    Click inside any column of numbers. In this example, I used the column SalePrice.
     
  2. 2.
    Click the Standard button in the Transform ribbon . The menu will appear.
     
  3. 3.
    Click Multiply. The Multiply dialog will appear.
     
  4. 4.
    Enter 1.1 in the Value box. The dialog will look like the one shown in Figure 7-9.
    A370912_2_En_7_Fig9_HTML.jpg
    Figure 7-9.
    Applying a calculation to a column
     
  5. 5.
    Click OK.
     
All the numbers in the selected column will be multiplied by 1.1. In other words, they are now 110% of the original value. Table 7-6 describes the possible math operations that you can carry out in Power BI Desktop Query.
Table 7-6.
Applying Basic Calculations
Transformation
Description
Applied Steps Definition
Add
Adds a selected value to the numbers in a column.
Added to Column
Multiply
Multiplies the numbers in a column by a selected value.
Multiplied Column
Subtract
Subtracts a selected value from the numbers in a column.
Subtracted from Column
Divide
Divides the numbers in a column by a selected value.
Divided Column
Integer-Divide
Divides the numbers in a column by a selected value and removes any remainder.
Integer-Divided Column
Modulo
Divides the numbers in a column by a selected value and leaves only the remainder.
Calculated Modulo
Percentage
Applies the selected percentage to the column.
Calculated Percentage
Percent Of
Expresses the value in the column as a percent of the value that you enter.
Calculated Percent Of
Note
You can also carry out many types of calculations in Power BI Desktop Data View and avoid carrying out calculations in the Query Editor. Indeed, many Power BI Desktop purists seem to prefer that anything resembling a calculation should take place inside the data model rather than at the Query stage. As ever, I will let you decide which approach you prefer. Yet I would advise you to read Chapters 11 through 13 to get a clearer understanding of how to add calculated elements to Power BI Desktop using DAX. This is because some transformations need to adjust to the situation (the context) in which they are used, and consequently need to be done using DAX. Be aware that some heavy transforms can slow the reports down if calculated at run time, whereas others can only be effective as part of a well-thought-out calculation process.
Finally, it is important to remember that you are altering the data when you carry out this kind of operation. In the real world, you might be safer duplicating a column before profoundly altering the data it contains. This allows you to keep the initial data available, albeit at the cost of increasing both the load time and the size of the Power BI Desktop file.

Date Transformations

Transforming dates follows similar principles to transforming text and numbers. As an example, here is how to isolate the month from a date:
  1. 1.
    Click inside the InvoiceDate column.
     
  2. 2.
    In the Transform ribbon , click the Date button. The menu will appear.
     
  3. 3.
    Click Year. The submenu will appear.
     
  4. 4.
    Select Year. The year part of the date will replace all the dates in the InvoiceDate column.
     
The other possible date transformations that are possible are given in Table 7-7.
Table 7-7.
Date Transformations
Transformation
Description
Applied Steps Definition
Age
Calculates the date and time difference (in days and hours) between the original date and the current local time.
Calculated Age
Date Only
Converts the data to a date without the time element.
Calculated Date
Year ➤ Year
Extracts the year from the date.
Calculated Year
Year ➤ Start of Year
Returns the first day of the year for the date.
Calculated Start of Year
Year ➤ End of Year
Returns the last day of the year for the date.
Calculated End of Year
Month ➤ Month
Extracts the number of the month from the date.
Calculated Month
Month ➤ Start of Month
Returns the first day of the month for the date.
Calculated Start of Month
Month ➤ End of Month
Returns the last day of the month for the date.
Calculated End of Month
Month ➤ Days in Month
Returns the number of days in the month for the date.
Calculated Days in Month
Month ➤ Name of Month
Returns the name of the month for the date.
Calculated Name of Month
Day ➤ Day
Extracts the day from the date.
Calculated Day
Day ➤ Day of Week
Returns the weekday as a number (Monday is 1, Tuesday is 2, etc.).
Calculated Day of Week
Day ➤ Day of Year
Calculates the number of days since the start of the year for the date.
Calculated Day of Year
Day ➤ Start of Day
Transforms the value to the start of the day for a date and time.
Calculated Start of Day
Day ➤ End of Day
Transforms the value to the end of the day for a date and time.
Calculated End of Day
Day ➤ Name of Day
Returns the weekday as a day of week.
Calculated Name of Day
Quarter ➤ Quarter
Returns the calendar quarter of the year for the date.
Calculated Quarter
Quarter ➤ Start of Quarter
Returns the first date of the calendar quarter of the year for the date.
Calculated Start of Quarter
Quarter ➤ End of Quarter
Returns the last date of the calendar quarter of the year for the date.
Calculated End of Quarter
Week ➤ Week of Year
Calculates the number of weeks since the start of the year for the date.
Calculated Week of Year
Week ➤ Week of Month
Calculates the number of weeks since the start of the month for the date.
Calculated Week of Month
Week ➤ Start of Week
Returns the date for the first day of the week (Monday) for the date.
Calculated Start of Week
Week ➤ End of Week
Returns the date for the last day of the week (Monday) for the date.
Calculated End of Week

Time Transformations

You can also transform date/time or time values into their component parts using Power BI Desktop Query. This is extremely similar to how you apply date transformations, but in the interest of completeness, the following explains how to do this:
  1. 1.
    Click inside the InvoiceDate column.
     
  2. 2.
    In the Transform ribbon , click the Time button. The menu will appear.
     
  3. 3.
    Click Hour. The hour part of the time will replace all the values in the InvoiceDate column.
     
Note
Time transformations can only be applied to columns of the date/time or time data types.
The range of Time transformations is given in Table 7-8.
Table 7-8.
Time Transformations
Transformation
Description
Applied Steps Definition
Time Only
Isolates the time part of a date and time.
Extracted Time
Local Time
Converts the date/time to local time from date/time and timezone values.
Extracted Local Time
Parse
Extracts the date and/or date/time elements from a text.
Parsed DateTime
Hour ➤ Hour
Isolates the hour from a date/time or date value.
Extracted Hour
Hour ➤ Start of Hour
Returns the start of the hour from a date/time or time value.
Calculated Start of Hour
Hour ➤ End of Hour
Returns the end of the hour from a date/time or time value.
Calculated End of Hour
Minute
Isolates the minute from a date/time or time value.
Extracted Minute
Second
Isolates the second from a date/time or time value.
Extracted Second
Earliest
Returns the earliest time from a date/time or time value.
Calculated Earliest
Latest
Returns the latest time from a date/time or time value .
Calculated Latest
Note
In the real world, you could well want to leave a source column intact and apply number or date transformations to a copy of the column. To do this, simply apply the same transformation technique, only use the buttons in the Add Column ribbon instead of those in the Transform ribbon .

Duration

If you have values in a column that can be interpreted as a duration (in days, hours, minutes, and seconds), then Power BI Desktop Query can extract the component parts of the duration as a data transformation . For this to work, however, the column must be set to the duration data type. This means that the contents of the column have to be interpreted as a duration by Power BI Desktop. Any values that are incompatible with this data type will be set to error values.
If you have duration data, you can extract its component parts like this:
  1. 1.
    Click inside the column.
     
  2. 2.
    In the Transform ribbon , click the Duration button. The menu will appear.
     
  3. 3.
    Click Hour. The hour part of the time will replace all the values in the InvoiceDate column.
     
The range of duration transformations is given in Table 7-9.
Table 7-9.
Duration Transformations
Transformation
Description
Applied Steps Definition
Days
Isolates the day element from a duration value.
Extracted Days
Hours
Isolates the hour element from a duration value.
Extracted Hours
Minutes
Isolates the minutes element from a duration value.
Extracted Minutes
Seconds
Isolates the seconds element from a duration value.
Extracted Seconds
Total Days
Displays the duration value as the number of days and a fraction representing hours, minutes, and seconds.
Calculated Total Days
Total Hours
Displays the duration value as the number of hours and a fraction representing minutes and seconds.
Calculated Total Hours
Total Minutes
Displays the duration value as the number of minutes and a fraction representing seconds.
Calculated Total Minutes
Total Seconds
Displays the duration value as the number of seconds and a fraction representing milliseconds .
Calculated Total Seconds
Multiply
Multiplies the duration (and all its component parts) by a value that you enter.
Multiplied Column
Divide
Divides the duration (and all its component parts) by a value that you enter.
Divided Column
Statistics ➤ Sum
Returns the total for all the duration elements in the column.
Calculated Sum
Statistics ➤ Minimum
Returns the minimum value of all the duration elements in the column.
Calculated Minimum
Statistics ➤ Maximum
Returns the maximum value of all the duration elements in the column.
Calculated Maximum
Statistics ➤ Median
Returns the median value for all the duration elements in the column.
Calculated Median
Statistics ➤ Average
Returns the average for all the duration elements in the column.
Calculated Average
Note
If you multiply or divide a duration, Power BI Desktop Query displays a dialog so that you can enter the value to multiply or divide the duration by.

Filling Down Empty Cells

Imagine a data source where the data has come into Power BI Desktop from a matrix-style structure. The result is that some columns only contain a single example of an element and then a series of empty cells until the next element in the list. If this is difficult to imagine, then take a look at the sample file CarMakeAndModelMatrix.xlsx shown in Figure 7-10.
A370912_2_En_7_Fig10_HTML.jpg
Figure 7-10.
A matrix data table in Excel
All these blank cells are a problem since we need a full data table—or rather, they would be, if Power BI Desktop did not have a really cool way of overcoming this particular difficulty. Do the following to solve this problem:
  1. 1.
    Open a new Power BI Desktop file.
     
  2. 2.
    In the splash screen, click Get Data.
     
  3. 3.
    In the Get Data dialog, select Excel. Then Click Connect and navigate to C:PowerBiDesktopSamplesCH07CarMakeAndModelMatrix.xlsx.
     
  4. 4.
    Click Open, select Sheet 1, and click Edit.
     
  5. 5.
    Click Edit Queries. This will take you directly to the Query Editor.
     
  6. 6.
    Click in the column that contains the empty cells; make sure that you click where you want to replace the empty cells with the contents of the first non-empty cell above.
     
  7. 7.
    In the Transform ribbon , click Fill. The menu will appear.
     
  8. 8.
    Select Down. The blank cells will be replaced by the value in the first non-empty cell above. Filled Down will be added to the Applied Steps list.
     
The table will now look like Figure 7-11.
A370912_2_En_7_Fig11_HTML.jpg
Figure 7-11.
A data table with empty cells replaced by the correct data
Note
This technique is built to handle a fairly specific problem and only really works if the imported data is grouped by the column containing the missing elements.
Although rare, you can also use this technique to fill empty cells with the value from below. If you need to do this, just select Fill ➤ Up from the Transform ribbon . In either case, you need to be aware that the technique is applied to the entire column .

Extracting Part of a Column’s Contents

There could well be times when the contents of a source column contain more data than you actually need. In cases like this, Power BI Desktop can help you by extracting only part of a column. This technique works like this:
  1. 1.
    Load the C:PowerBiDesktopSamplesCH07CH07Example1.pbix sample file and click Edit Queries.
     
  2. 2.
    Click inside the InvoiceNumber column.
     
  3. 3.
    In the Transform ribbon , click Extract ➤ Text Before Delimiter. The Text Before Delimiter dialog will be displayed.
     
  4. 4.
    Enter a hyphen (or a minus sign) in the Delimiter field. The dialog will look like Figure 7-12.
    A370912_2_En_7_Fig12_HTML.jpg
    Figure 7-12.
    The Text Before Delimiter dialog
     
  5. 5.
    Click OK. The contents of the field will be replaced by the characters before the hyphen. A step named Extracted Text Before Delimiter will be added to the Applied Steps list.
     
The Extract function allows you to choose from a variety of ways in which you can extract a subset of data from a column. The currently available options are explained in Table 7-10.
Table 7-10.
Extract Transformations
Transformation
Description
Applied Steps Definition
Length
Displays the length in characters of the contents of the field.
Extracted Length
First Characters
Displays a specified number of characters from the left of the field.
Extracted First Characters
Last Characters
Displays a specified number of characters from the right of the field.
Extracted Last Characters
Range
Displays a specified number of characters between a specified start and end position (in characters, from the left of the field).
Extracted Range
Text Before Delimiter
Displays all the text occurring before a specified character.
Extracted Text Before Delimiter
Text After Delimiter
Displays all the text occurring after a specified character.
Extracted Text After Delimiter
Text Between Delimiters
Displays all the text occurring between two specified characters .
Extracted Text Between Delimiters

Advanced Extract Options

Three of the Extract options (Text Before Delimiter, Text After Delimiter, and Text Between Delimiters) let you apply some advanced options that allow you to push the envelope even further when extracting data from a column. These techniques are explained in the following two sections.

Text Before and After Delimiter

If you are extracting part of the contents of a column and you are using a delimiter to isolate the text you want to keep, then you have a couple of additional options available.
You can access these options from the dialog that you saw in Figure 7-8 by clicking Advanced Options. The dialog will then look like the one shown in Figure 7-13.
A370912_2_En_7_Fig13_HTML.jpg
Figure 7-13.
The Advanced Options of the Text Before and Text After Delimiter dialogs
The two options that you now have are:
  • Scan for the delimiter: This option lets you choose between working forward from the start of the contents of the column or working backward from the end of the contents of the column to locate the delimiter you are searching for.
  • Number of delimiters to skip: Here you can specify that it is the nth occurrence of a delimiter that interests you.

Text Between Delimiters

The Advanced Options of the Text Between Delimiters dialog essentially lets you apply the same options that you saw previously, only for both the initial delimiter and the final delimiter. In Figure 7-14 you can see this in the Text Between Delimiters dialog.
A370912_2_En_7_Fig14_HTML.jpg
Figure 7-14.
The Advanced Options of the Text Between Delimiters dialog
Note
The Extract button can be found in both the Transform and New Column ribbons. If you carry out this operation from the Transform ribbon , then the contents of the existing column will be replaced. If you use the button in the Add Column ribbon , then a new column containing the extracted text will be added at the right of any existing columns.

Duplicating Columns

Sometimes you just need a simple copy of a column, with nothing added and nothing taken away. This is where the Duplicate Column button comes into play.
  1. 1.
    Load the C:PowerBiDesktopSamplesCH07CH07Example1.pbix sample file.
     
  2. 2.
    Open the Power BI Desktop Query Editor.
     
  3. 3.
    Click inside (or on the title of) the column that you want to duplicate. I will use the Make column in this example.
     
  4. 4.
    In the Add Column ribbon , click the Duplicate Column button. After a few seconds, a copy of the column is created at the right of the existing table. Duplicated Column will appear in the Applied Steps list.
     
  5. 5.
    Scroll to the right of the table and rename the existing column; it is currently named Make-Copy.
     
Note
The duplicate column is named Original Column Name-Copy. I find that it helps to rename copies of columns sooner rather than later in a data mashup process.

Splitting Columns

Sometimes a source column contains data that you really need to break up into smaller pieces across two or more columns. The following are classic cases where this happens:
  • A column contains a list of elements, separated by a specific character (known as a delimiter ).
  • A column contains a list of elements, but the elements can be divided at specific places in the column.
  • A column contains a concatenated text that needs to be split into its composite elements (a bank account number or a Social Security number are examples of this).
The following short sections explain how to handle such eventualities.

Splitting Column by a Delimiter

Here is another requirement that you may encounter occasionally. The data that has been imported has a column that needs to be further split into multiple columns. Imagine a text file where columns are separated by semicolons, and these subdivisions each contain a column that holds a comma-separated list of elements. Once you have imported the file, you then need to further separate the contents of this column that uses a different delimiter.
Here is what you can do to split the data from one column over several columns:
  1. 1.
    Edit the C:PowerBiDesktopSamplesCH07DataToParse.xlsx sample file in the Query Editor.
     
  2. 2.
    In the Transform ribbon , click Use First Row As Headers.
     
  3. 3.
    Click inside the ClientList column. You can see that this column contains several data elements, each separated by a semicolon.
     
  4. 4.
    In the Transform ribbon , click Split Column ➤ By Delimiter. The Split Column by Delimiter dialog appears.
     
  5. 5.
    Select Semicolon from the list of available options in the “Select or enter delimiter” popup (although the Query Editor could well have detected this already).
     
  6. 6.
    Click “Each occurrence of the delimiter” as the location to split the text column. The dialog should look like Figure 7-15.
    A370912_2_En_7_Fig15_HTML.jpg
    Figure 7-15.
    Splitting a column using a delimiter
     
  7. 7.
    Click OK. Split Column by Delimiter will appear in the Applied Steps list.
     
The initial column is replaced and all the new columns are named InvoiceNumber.1, InvoiceNumber.2, and so forth. As many additional columns as there are delimiters are created; each is named (Column.n) and is sequentially numbered. The result of this operation looks like Figure 7-16.
A370912_2_En_7_Fig16_HTML.jpg
Figure 7-16.
The results of splitting a column
This particular process has several options, and their consequences can be fairly far-reaching as far as the data is concerned. Table 7-11 contains a description of the available options.
Table 7-11.
Delimiter Split Options
Option
Description
Colon
Uses the colon (:) as the delimiter.
Comma
Uses the comma (,) as the delimiter.
Equals Sign
Uses the equals sign (=) as the delimiter.
Semi-Colon
Uses the semicolon (;) as the delimiter.
Space
Uses the space ( ) as the delimiter.
Tab
Uses the tab character as the delimiter.
Custom
Lets you enter a custom delimiter.
At the Left-Most Delimiter
Splits the column once only at the first occurrence of the delimiter.
At the Right-Most Delimiter
Splits the column once only at the last occurrence of the delimiter.
At Each Occurrence of the Delimiter
Splits the column into as many columns as there are delimiters.
Split into Columns
This leaves the number of rows as it is in the dataset and creates new columns for each new element resulting from the split operation.
Split into Rows
Creates a new row for each new element resulting from the split operation and duplicates the existing record as many times as there are split elements.
Advanced Options ➤ Number of Columns to Split Into
Allows you to set a maximum number of columns into which the data is split in chunks of the given number of characters. Any extra columns are placed in the rightmost column.
Advanced Options ➤ Quote Character
Separators inside a text that is contained in double quotes are not used to split the text into columns. Setting this option to “none” will split elements inside quotes.
Split using special characters
Enables the Insert Special Character button. You can then click this button and select the special character to split data on. The choice is between: Tab, Carriage Return, Line-Feed, Carriage Return, and Line-Feed or non-breaking space.

Splitting Columns by Number of Characters

Another variant on this theme is when text in each column is a fixed number of characters and needs to be broken down into constituent parts at specific intervals. Suppose, for instance, that you have a field where each group of (a certain number of) characters has a specific meaning, and you want to break it into multiple columns. Alternatively, suppose you want to extract the leftmost or rightmost n characters and leave the rest. A bank account or Social Security number are examples of this. This is where splitting a column by the number of characters can come in useful. As the principle is very similar to the process that we just saw, I will not repeat the whole thing again. All you have to do is choose the “By number of characters” menu option at step 5 in the previous exercise. Options for this type of operation are given in Table 7-12.
Table 7-12.
Options When Splitting a Column by Number of Characters
Option
Description
Number of Characters
Lets you define the number of characters of data before splitting the column.
Once, As Far Left As Possible
Splits the column once only at the given number of characters in from the left.
Once, As Far Right As Possible
Splits the column once only at the given number of characters in from the right.
Repeatedly
Splits the column as many times as necessary to cut it into segments every defined number of characters.
Advanced Options ➤ Number of Columns to Split Into
Allows you to set a maximum number of columns into which the data is split in chunks of the given number of characters. Any extra columns are placed in the rightmost column.
Split into Columns
This leaves the number of rows as it is in the dataset and crates new columns for each new element resulting from the split operation.
Split into Rows
Creates a new row for each new element resulting from the split operation and duplicates the existing record as many times as there are split elements.
There are a couple of things to note when splitting columns :
  • When splitting by a delimiter, Power BI Desktop makes a good attempt at guessing the maximum number of columns into which the source column must be split. If it gets this wrong (and you can see what its guesstimate is if you expand the Advanced Options box), you can override the number here.
  • If you select a Custom Delimiter , Power BI Desktop displays a new box in the dialog where you can enter a specific delimiter.
  • Not every record has to have the same number of delimiters. Power BI Desktop simply leave the rightmost column(s) blank if there are fewer split elements for a row.
Note
You can only split columns if they are text data. The Split Column button remains grayed out if your intention is to try to split a date or numeric column.

Merging Columns

You may be feeling a certain sense of déjà vu when you read the title of this section. After all, we saw how to merge columns (that is, how to fuse the data from several columns into a single, wider column) in a previous chapter, did we not?
Yes, we did indeed. However, this is not the only time in this chapter that you will see something that you have tried previously. This is because Power BI Desktop Query repeats several of the options that are in the Transform Ribbon in the Add Column Ribbon . While these functions all work in much the same way, there is one essential difference. If you select an option from the Transform Ribbon, then the column(s) that you selected is modified. If you select a similar option from the Add Column Ribbon, then the original column(s) will not be altered, but a new column is added containing the results of the data transformation.
Merging columns is a case in point. Now, as I went into detail as to how to execute this kind of data transformation in the previous chapter, I will not describe it all over again here. Suffice it to say, if you Ctrl-click the headings of two or more columns and then click Merge Columns in the Add Column ribbon , you will still see the data from the selected columns concatenated into a single column. However, this time the original columns remain in the dataset. The new column is named Merged, exactly as was the case for the first of the columns that you selected when merging columns using the Transform ribbon .
The following are other functions that can either overwrite the data in existing columns or display the result as a new column:
  • Format: Trims or changes the capitalization of text.
  • Extract: Takes part of a column and creates another column from this data.
  • Parse: Adds a column containing the source column data as JSON or XML strings.
  • Statistics: Creates a new column of aggregated numeric values.
  • Standard: Creates a new column of calculated numeric values.
  • Scientific: Creates a new column by applying certain kinds of math operations to the values in a column.
  • Trigonometry: Creates a new column by applying certain kinds of trigonometric operations to the values in a column.
  • Rounding: Creates a new column by rounding the values in a column.
  • Information: Creates a new column Indicating arithmetical information about the values in a column.
  • Date: Creates a new column by extracting date elements from the values in a date column.
  • Time: Creates a new column by extracting time elements from the values in a time or date/time column.
  • Duration: Creates a new column by calculating the duration between two dates or date/times.
When transforming data, the art is to decide whether you want or need to keep the original column before applying one of these functions. Yet, once again, it is not really fundamental if you later decide that you made an incorrect decision, as you can always backtrack. Alternatively, you can always decide to insert new columns as a matter of principle and delete any columns that you really do not need at a later stage in the data transformation process.

Custom Columns

Another way to extend the original data table is to add more columns. Although these are known as custom columns in Power BI Desktop, they are also known more generically as derived columns or calculated columns. Although they can do many things, their essential role is to
  • Concatenate (or join, if you prefer) existing columns.
  • Add calculations to the data table.
  • Extract a specific part of a column.
  • Add flags to the table based on existing data.
The best way to understand these columns is probably to see them in action. You can then extend these principles in your own processes.
Initially, let’s perform a column join and create a column named Vehicle, which concatenates the Make and Model columns with a space in between.
  1. 1.
    Load the C:PowerBiDesktopSamplesCH07CH07Example1.pbix sample file.
     
  2. 2.
    Click Edit Queries.
     
  3. 3.
    In the Add Column ribbon , click Custom Column. The Add Custom Column dialog is displayed.
     
  4. 4.
    Click the Make column in the column list on the right, then click the Insert button; =[Make] will appear in the Custom Column Formula box at the left of the dialog.
     
  5. 5.
    Enter & “ ” & in the Custom Column Formula box after =[Make].
     
  6. 6.
    Click the Model column in the column list on the right, and then click the Insert button.
     
  7. 7.
    Click inside the New Column Name box and enter a name for the column. I call it CarType. The dialog will look like Figure 7-17.
    A370912_2_En_7_Fig17_HTML.jpg
    Figure 7-17.
    The Add Custom Column dialog
     
  8. 8.
    Click OK. The new column is added to the right of the data table; it contains the results of the formula. Inserted Column appears in the Applied Steps list.
     
You can always double-click a column to insert it into the Custom Column Formula box if you prefer. To remove a column, simply delete the column name (including the square brackets) in the Custom Column Formula box.
Note
You must always enclose a column name in square brackets.
Rather than take you step by step through other examples, I prefer to show you some of the formulas that you can use to calculate columns and extract data into a new column. These code snippets are given in Table 7-13.
Table 7-13.
Custom Column Code Examples
Output
Code Snippet
Description
Column Calculations
= [SalePrice]-[CostPrice]
Subtracts the Cost Price from the Sale Price to give the Gross Margin
Column Arithmetic
=[SalePrice] * 1.2
Adds the UK sales tax (20%) to the Net Sale Price
Left
Text.Start([Make],3)
Returns the first three characters from the Make column
Right
Text.End([Make],3)
Returns the last three characters from the Make column
Up to a specific character
Text.Start([Make],Text.PositionOf([Make]," "))
Returns the leftmost characters up to the first space
Note
If you are an Excel user, you can probably see a distinct similarity with how you build formulas in Excel and Power Pivot, except that here (as in Power Pivot) you use column names rather than cell references.
If you look ahead to Chapters 12-14, then you are probably wondering why you carry out operations like this in Power BI Desktop Query when you can do virtually the same thing in the data model. Well, it is true that there is some overlap; so you have the choice of which to use. You can perform certain operations at multiple stages in the data preparation and analysis process. It all depends on how you are using the data and with what tool you are carrying out the analyses.
The last three examples in Table 7-13 probably seem a little abstruse for the moment. This is because they are examples of how to use the Power BI data transformation language. This language is normally referred to as “M.” Plumbing the depths of this language is outside the scope of this book, unfortunately.

Creating Columns from Examples

Creating your own columns can be a little scary if you have not had much experience with Excel or Power Pivot formulas, so the Power BI Desktop development team has tried to make your life easier by adding another way to create custom columns . Instead of referring to columns by the column name (and having to handle square brackets and other peculiar characters), you can build a new column by using the actual data in a row.
The following steps show an example of how to do this:
  1. 1.
    Load the C:PowerBiDesktopSamplesCH07CH07Example1.pbix sample file.
     
  2. 2.
    Click Edit Queries.
     
  3. 3.
    In the Add Column ribbon , click Column From Examples. A new kind of formula bar will appear above the data. It will look like Figure 7-18. At the same time, a new, empty column will be created at the right of the existing data.
    A370912_2_En_7_Fig18_HTML.jpg
    Figure 7-18.
    Creating a column from examples
     
  4. 4.
    Double-click in the new column on the right. A list of data from each field will be displayed, as shown in Figure 7-19.
    A370912_2_En_7_Fig19_HTML.jpg
    Figure 7-19.
    Displaying the data from a row when creating a column from examples
     
  5. 5.
    Double-click Red to select the data from the Color column.
     
  6. 6.
    Enter a space, a hyphen, and a space, then type Camargue (this is the name of the model for this row).
     
  7. 7.
    Click OK in the formula bar at the top.
     
Power BI Desktop will add a new column containing the color, a separator, and the Model. Inserted Merged Column will be added as a new step in the Applied Steps list.
As you can see from this short example , creating columns by example lets you use the data from a column rather than the column name. It also removes the need for double quotes and ampersand characters that you had to use when writing the code to create a new column in the previous section.
Tip
If you select Column From Examples ➤ From Selection, then you will only see data from the selected columns when you double-click inside the new column to see samples of data as you did in step 6 of this example.

Adding Conditional Columns

Not all additional columns are a simple extraction or concatenation of existing data. There will be times when you will want to apply some simple conditions that define the contents of a new column. This is where Power BI Desktop’s Conditional Column function comes into its own.
Conditional Columns are probably best understood with the aid of a practical example. So let’s suppose that you want to add a column that contains a comment on the type of buyer for Brilliant British Cars’s products. Here is how you can do this:
  1. 1.
    Load the C:PowerBiDesktopSamplesCH07CH07Example1.pbix sample file.
     
  2. 2.
    Click Edit Queries.
     
  3. 3.
    In the Add Column ribbon , click Conditional Column. The Add Conditional Column dialog will appear.
     
  4. 4.
    Enter BuyerType in the “New column name” field.
     
  5. 5.
    Select Make as the column name.
     
  6. 6.
    Leave Equals as the operator.
     
  7. 7.
    Enter Rolls Royce as the value.
     
  8. 8.
    Enter Posh as the output.
     
  9. 9.
    Click Add Rule .
     
  10. 10.
    Select Make as the column name, leave Equals as the operator, enter Bentley as the value, and add Classy as the output.
     
  11. 11.
    Enter Bling in the Otherwise field. The dialog will look like Figure 7-20.
    A370912_2_En_7_Fig20_HTML.jpg
    Figure 7-20.
    The Add Conditional Column dialog
     
  12. 12.
    Click OK. The new column will be added containing either Posh, Classy, or Bling, depending on the make for each record. Added Conditional Columns will appear as the new step in the Applied Steps list.
     
As you can see from the Add Conditional Column dialog, it has a range of options that you can tweak when defining the logic for the data matching. These options are outlined in Table 7-14.
Table 7-14.
Custom Column Operators
Operator
Description
Equals
Sets the text that must match the contents of the selected field for the output to be applied.
Does Not Equal
Sets the text that must not match the contents of the selected field for the output to be applied.
Begins With
Sets the text at the left of the selected field for the output to be applied.
Does Not Begin With
Sets the text that must not appear at the left of the selected field for the output to be applied.
Ends With
Sets the text at the right of the selected field for the output to be applied.
Does Not End With
Sets the text that must not appear at the right of the selected field for the output to be applied.
Contains
Sets the text that can appear anywhere in the selected field for the output to be applied.
Does Not Contain
Sets the text that cannot appear anywhere in the selected field for the output to be applied.
It is also worth noting that the comparison value , the output, and the alternative output can be values (as was the case in this example), columns, or parameters (which you will learn about in Chapter 9). If you want to remove a rule, simply click the ellipses at the right of the required rule and select Delete.
Tip
Should you wish to alter the order of the rules in the Add Conditional Column dialog, all you have to do is click the ellipses at the right of the selected rule and select Move Up or Move Down from the popup menu.

Index Columns

An index column is a new column that numbers every record in the table sequentially. This numbering scheme applies to the table, because it is currently sorted and begins at zero. There are many situations where an index column can be useful. The following are some examples:
  • Reapply a previous sort order.
  • Create a unique reference for every record.
  • Prepare a recordset for use as a dimension table in a Power BI Desktop data model. In cases like this, the index column becomes what dimensional modelers call a surrogate key.
This list is not intended to be exhaustive in any way; you will almost certainly find other uses as you work with Power BI Desktop. Whatever the need, here is how to add an index column:
  1. 1.
    In the Add Column ribbon , click Index Column. The new, sequentially numbered column is added at the right of the table, and Added Index is added to the Applied Steps list.
     
  2. 2.
    Scroll to the right of the table and rename the index column; it is currently named Index.
     
You have a fairly free hand when it comes to deciding how to begin numbering an index column . The choices are as follows:
  • Start at 0 and increment by a value of 1 for each row.
  • Start at 1 and increment by a value of 1 for each row.
  • Start at any number and increase by any number.
As you saw in step 1, the default is for Power BI Desktop Query to begin numbering rows at 0. However, you can choose another option by clicking the small triangle to the right of the Add Index Column button. This displays a menu with the three options outlined.
Selecting the third option, Custom , displays the dialog that you see in Figure 7-21.
A370912_2_En_7_Fig21_HTML.jpg
Figure 7-21.
The Add Index Column dialog
This dialog lets you specify the start number for the first row in the dataset as well as the increment that is added for each record.

Conclusion

In this chapter, you learned some essential techniques that you can use to cleanse and extend datasets . You saw how to round numbers up and down, how to deliver conformed text presentation, and how to remove extraneous spaces and non-printing characters from columns of data.
You also saw how to replace values inside columns, as well as ways of applying mathematical, statistical, and trigonometric functions to numbers. Other techniques covered extracting date, time, and duration elements from date/time and duration columns.
Finally, you saw a series of techniques that help you to add new columns based on the data in existing columns. These range from simple copies of an entire column or combining columns to extracting parts of a column’s data or even deducing different data that is added to a new column using simple logic.
It is now time to see how you can join hitherto separate datasets into single queries, and parse complex data types to add them to a dataset. You will even learn how to load multiple files in a single query and how to pivot and unpivot data. All of his will be the subject of the next chapter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.186.124