CHAPTER 12

image

Discovering and Loading Data with Power Query

Before you can present any analysis or insight, you need source data. Your source data could be in many places and in many formats. Nonetheless, you need to access it, look at it, and quite possibly clean it up to some extent. You may also need to join separate data sources before you can shape the data into a coherent data set using PowerPivot, deliver the results using Power View or Power Map, and then share it using Power BI.

Discovering, loading, cleaning, and modifying source data is where Power Query comes in. Using this, the latest addition to the Microsoft self-service business intelligence stack, you can carry out

  • Data Discovery—Find and connect to a myriad of data sources containing potentially useful data. This can be from both public and private data sources.
  • Data Loading—Select the data you have examined and load it into Power Query for shaping.
  • Data Modification—Modify the structure of each data table that you have imported, filter and clean the data itself, and then join any separate data sources (we will look at this in detail in Chapter 13).

Although I have outlined these three steps as if they are completely separate and sequential, the reality is that they often blend into a single process. Indeed there could be many occasions when you will examine the data after it has been loaded into Power Query—or join data tables before you clean them. The core objective will, however, always remain the same: find some data and then load it into Power Query where you can tweak, clean, and shape it.

This process could be described simplistically as “First, catch your data.” In the world of data warehousing, the specialists call it ETL, which is short for Extract Transform Load. Despite the reassuring confidence that the acronym brings, this process is rarely a smooth logical progression through a clear-cut series of processes. The reality is often far messier than that. You may often find yourself importing some data, cleaning it, importing some more data from another source, combining the second data set with the first one, cleaning some more, and then repeating many of these operations several times.

In this chapter and the following one then, I will try and show you how the process can work in an ideal world. I hope that this will make the various steps that comprise an ETL process clearer. All I am asking is that you remain aware that all the options and possibilities that Power Query offers make it a multifaceted and tremendously capable tool. The science is to know which options to use. The art is to know when to use them.

In this chapter we will begin by seeing how to find and load data from a variety of sources. Once again I will be using a set of example files that you can find on the Apress web site. If you have followed the instructions in Appendix A, then these files will be in the C:HighImpactDataVisualizationWithPowerBI folder. Now, although I will be re-creating the source data set that we have used up until now in this book, I will also be going beyond that and importing data that is not used in the Power View, PowerPivot, and Power Map chapters. This is because the sheer variety of techniques that can be applied to load and transform data goes beyond those that you can demonstrate with a simple data set. Consequently, at times, the examples I use will not necessarily lead into a structured data set, but I will use them as just that—examples—to show you a specific solution to a particular ETL problem.

The Power Query Interface

You access Power Query using the Power Query tab and ribbon that appear in Excel once you’ve downloaded the Power Query add-in and installed it (as described in Chapter 1). Similar to the way we worked with PowerPivot, you begin working in Excel, using the options available in the Power Query ribbon, and then you progress to the Power Query window where you will carry out most of your work.

The Power Query ribbon essentially lets you find data and connect to it. It will also show a list of all the queries that you have developed in the current workbook and let you manage queries. It will be your starting point for both developing new queries and using or modifying existing queries.

The Power Query Ribbon

The Power Query ribbon contains the elements that are explained in Figure 12-1.

9781430266167_Fig12-01.jpg

Figure 12-1. The Power Query ribbon

Table 12-1. Power Query Ribbon Options

Option

Comments

Online Search

Search for data from public sources.

From Web

Import tabular data from web pages.

From File

Import data from CSV, text, XML, and Excel files.

From Database

Import data from relational databases.

From Other Sources

Import data from a variety of sources.

From Table

Import data from an Excel table to the current workbook.

Merge

Join two queries to add columns from one query output to another.

Append

Add data from an identically structured query output to the selected query.

Workbook

Display—or hide—the list of queries in the current workbook.

Shared

Display any shared queries.

Locale

Define the regional settings used by Power Query.

Fast Combine

Bypass security to accelerate data loads.

Data Source Settings

Manage data source settings.

Update

Update Power Query with the latest version.

Options

Choose Power Query options.

Sign In

Sign in to Power BI to access shared queries and data.

Data Sources

Power Query groups potential data sources into the following categories:

  • Web Pages—This option lets you see all available data that is formatted as tables in a web page. Then you can import tables of data from these pages.
  • File Sources—Include Excel (both from the current worksheet and other workbooks), CSV (comma separated values) files, text files, and XML files.
  • Databases—A fairly comprehensive collection of relational databases that are current in the workplace and in the cloud, including (among others) SQL Server, MS Access, and Oracle. The full list is given later on in this chapter.
  • Other Sources—A considerable and ever-growing range of data sources from Facebook to MS Exchange. The full list is given later on in this chapter.

You can also list the contents of folders on any available local disk or network share (even if it is not always a data source) and then leverage this to import files. Similarly (if you have the necessary permissions), you can list the databases and data available on the database servers you connect to. This way Power Query can provide not only the data, but also the metadata—or data about data—which can help you get the job done.

Unfortunately the sheer range of data sources from which Power Query can read data is such that we do not have space here to examine the minutiae of every one. As a consequence, we will take a rapid tour of some of the most frequently used data sources in the next few pages. You will probably be using some types of data source much more than others including, I hope, those that are outlined here. What is more, many of the data sources can be used in a similar way, and the Power Query interface does a wonderful job of making the arcane connection details as unobtrusive as possible. As a result, once you have grasped the way in which Power Query connects to data sources in general, it should not be difficult for you to use some of the more rare connections types without detailed explanation.

File Sources

Sending files across networks and over the Internet or via email has become second nature to most of us. As long as the files that you have obtained conform to some of the widely-recognized standards currently in use (of which you will learn more later), you should have little difficulty in loading them into Power Query.

The file sources that Power Query can currently read and from which it can load data are given in Table 12-2.

Table 12-2. File Sources

File Sources

Comments

Excel

Allows you to read Microsoft Excel files (versions 97 to 2013) and load worksheets, named ranges, and tables

CSV

Lets you load text files that conform to the CSV (comma separated values) format

XML

Allows you to load XML data

Text

Lets you load text files using a variety of separators

Databases

Much corporate data currently resides in relational databases. It follows that being able to look at this data is essential for much of today’s business intelligence. In the real world, connecting to corporate data will require, at the very least, you to have a logon name and possibly a password that will let you connect. I imagine that it will also require permissions to read the tables and views (which, without going into the grisly details, are essentially the same thing for Power Query). So the techniques described here are probably the easy bit. The hard part is having to be pleasant with the guardians of corporate data so that they concede that you actually need the data and should be allowed to see it.

The databases that Power Query can currently connect to, and can preview and load data from, are given in Table 12-3.

Table 12-3. Database Sources

Database

Comments

SQL Server

Lets you connect to a Microsoft SQL Server on-premises database and import records from all the data tables and views that you are authorized to access

Windows Azure SQL Database

Lets you connect to a Microsoft SQL Server cloud-based database and import records from all the data tables and views that you are authorized to access

Access Database

Lets you connect to a Microsoft Access file on your network and load queries and tables

Oracle Database

Lets you connect to an Oracle database and import records from all the data tables and views that you are authorized to access

DB2 Database

Lets you connect to an IBM DB2 database and import records from all the data tables and views that you are authorized to access

MySQL

Lets you connect to a MySQL database and import records from all the data tables and views that you are authorized to access

PostgreSQL

Lets you connect to a PostgreSQL database and import records from all the data tables and views that you are authorized to access

Sybase

Lets you connect to a Sybase database and import records from all the data tables and views that you are authorized to access

Teradata

Lets you connect to a Teradata database and import records from all the data tables and views that you are authorized to access

Connecting to Oracle, DB2, MySQL, PostgreSQL, Sybase, or Teradata will require not only that the database administrator has given you the necessary permissions, but also that connection software (known as drivers or providers) has been installed on your PC. Given the “corporate” nature of the requirements, I suggest that you talk directly to your IT department to get this set up.

Other Sources

Up until now we have seen some of the more traditional sources of data that you might need for your analysis. As the world changes, the available sources evolve, and the current trend is toward less “structured” (and controlled) data sources and toward more varied and often less corporate sources.

Power Query can connect to, and read data from, a whole host of these less classic sources. Those to which it can currently connect are listed in Table 12-4.

Table 12-4. Other Sources

Source

Comments

SharePoint List

Loads a SharePoint List as a data table. You will need SharePoint permissions to access SharePoint data.

OData Feed

Connects to an OData feed to read and load the data it contains. OData is a standardized protocol for creating and consuming data, especially over the Internet.

Windows Azure Marketplace

Lets you load data that you are authorized to access on the Windows Azure Marketplace. This will require a Windows Azure Marketplace subscription.

Hadoop File

Reads Hadoop (“Big Data”) files.

Windows Azure HDInsight

Reads cloud-based Hadoop files in the Microsoft Azure environment.

Windows Azure Blob Storage

Reads from a cloud-based unstructured data store.

Windows Azure Table Storage

Reads from Windows Azure tables.

Active Directory

Reads data from the Enterprise Active Directory. This will probably require custom access rights.

MS Exchange

Reads data from the MS Exchange Email system.

Facebook

Reads Facebook data.

These data sources are so varied, and often personal, that I will not be going through them here. Suffice it to say that the connection principles and approach are as identical as possible to those that you will see in the next few pages. So it is up to you to try connecting to this brave new world and using the data it offers.

Loading Data

It is time to start looking at the heavy lifting aspect of Power Query, and how you can use it to load data from a variety of different sources. As always, I will begin on the bunny slopes with a simple example of “scraping” data from a web page. Then, given the plethora of available data sources, and so we can  better give  the process some structure, we will begin by loading the five tables that make up the data set you used in the chapters on Power View and PowerPivot. However, each table will come from one of the classic data sources that currently can be found in most workplaces:

  • CSV—This file type will be the source of the Countries table.
  • Text—Here we will use the Date table.
  • XML—We will use an XML file containing the Colours data.
  • Excel—We will use the SalesData table from both an Excel file on disk and an open file.
  • Access—Finally we will load the Clients table from an Access database file.

After we have loaded these five tables, we will look into getting data from a relational database—SQL Server will be the example here—as databases are, in my experience, a frequent source of core data for analysis, and we can apply several tricks and techniques to database sources. Finally, we will look at a technique for loading multiple files of the same format, as this can be a frequent requirement.

Web Pages

As a first and extremely simple example, let’s grab some data from a web page. Since I want to concentrate on the method rather than the data, I will use a web page that has nothing to do with the sample data in the book, and that we will not be using at all, other than as a simple introduction to the process of loading data using Power Query.

  1. In Excel, click on the Power Query tab to activate the Power Query ribbon.
  2. Click From Web. The From Web dialog will be displayed.
  3. Enter the following URL (it is a Microsoft help page for Power Query that contains a few tables of data): http://office.microsoft.com/en-gb/excel-help/guide-to-the-power-query-ribbon-HA103993930.aspx. Of course, if you have a URL that you want to try out, then feel free! The dialog will look something like Figure 12-2:

    9781430266167_Fig12-02.jpg

    Figure 12-2. The From Web source dialog

  4. Click OK. The Navigator window will appear to the right of the Excel worksheet. After a few seconds during which Power Query is connecting to the web page, the list of available tables of data in the web page will be displayed, as shown in Figure 12-3.

    9781430266167_Fig12-03.jpg

    Figure 12-3. The Navigator window

  5. Hover the mouse pointer over one of the tables in the Navigator window. The Peek popup (also known as the fly-out) will appear to show you what the data in the chosen table looks like. This is shown in Figure 12-4:

    9781430266167_Fig12-04.jpg

    Figure 12-4. The Peek popup

  6. Select Table 4 in the Navigator window.
  7. Click Edit at the bottom of the window (or double-click the table name). The Power Query window will open displaying the table of data loaded into Power Query. It should look like Figure 12-5. We will be looking at this window in detail later in this chapter.

    9781430266167_Fig12-05.jpg

    Figure 12-5. The Power Query window

  8. Click Apply & Close in the Power Query Home ribbon. The Power Query window will close and copy the data into a new Excel worksheet. Once back in Excel you will see not only the data but also the WorkBook Queries window.

In a few simple steps you have seen just what Power Query can do and how it is done. You have found data and loaded it into Excel—a complete ETL process in a few clicks. In a very short time, you have seen lots of things happen and a whole new world open up in front of you. Consequently, I think that it would be a good idea to explain some of these new elements before going any further.

The Navigator Window

The Navigator window will appear when connecting to many, but not all, data sources. It is there to let you

  • Take a quick look at the available data tables in the data source.
  • Look at the data in individual tables.
  • Select one or more data tables to load into Power Query and from there into Excel or PowerPivot.

Depending on the data source to which you have connected, you might see only a few data tables in the Navigator window, or hundreds of them. In any case, what you can see are the structured data sets that Power Query can recognize and is confident that it can import. Equally dependent on the data source is the level of complexity of what you will see in the Navigator window. If you are looking at a database server, for instance, then you may start out with a list of databases, and may need to dig deeper into the arborescence of the data by expanding databases to list the available data tables and views.

The Peek Popup

The Peek popup (also known as the fly-out) is, as its name implies, a preview of the data in a table. It provides

  • A brief overview of the top few records in any of the datasets that you want to look at. Given that the data you are previewing could be hundreds of columns wide and millions of rows long, there could be scroll bars for the data table visible inside the Peek popup.
  • A list of the available columns in the data table. These are shown at the bottom of the Peek popup.

image Note   The Peek popup is a brilliant data discovery tool. Without having to load any data, you can take a quick look at the data source and any data that it contains that can probably be loaded by Power Query into Excel or PowerPivot. You can then decide if it is worth loading, and so you do not waste time on a data load that could be superfluous.

The Peek popup will only show you a tiny snippet of the available data. This is deliberate because it is only designed to provide a preview. One useful trick, if you have a table containing many columns, is to click on a column name at the bottom of the Peek popup. The chosen column will appear in the window so that you can see the data it contains.

The WorkBook Queries Window

Power Query will not forget how it loaded data. Each separate load process is remembered as a “query” and appears in the WorkBook Queries window, which is normally to the right of the current workbook. Each query in the WorkBook Queries window has a name and a last activated date, as well as an indication of whether it loads data into Excel or PowerPivot or remains a query process, but without actually loading the data anywhere from Power Query.

The WorkBook Queries window also has a Peek popup. So if you hover the cursor over a query, you will see what the data returned by this query will look like. If you want to go back to a query to continue modifying it, all you have to do is double-click on it, and you will open the Power Query window and see the data as you left it. Figure 12-6 shows the WorkBook Queries window in more detail.

9781430266167_Fig12-06.jpg

Figure 12-6. The WorkBook Queries window

Since you will be using the various possibilities of the WorkBook Queries window in your day-to-day work using Power Query, Table 12-5 explains the available options in the context menu that appears when you right-click on a WorkBook query.

Table 12-5. Query Options

Option

Description

Edit

Opens the Power Query window where you can modify the query. Double-clicking will also carry out this action. This is also possible by clicking on Edit in the Peek popup.

Refresh

Runs the query and reprocesses the data.

Duplicate

Makes a copy of the query.

Reference

Creates a new query that references the selected query.

Delete

Deletes the query. You will be asked for confirmation. This is also possible by clicking on Delete in the Peek popup.

Merge

Lets you join the data from two queries. This is described in the section “Merge.”

Append

Lets you add identically structured data from another query to the selected query.

Share

Lets the user share this query (and its output) using Power BI. This is described in Chapter 15.

Show the Peek

Displays the Peek popup.

If you wish to rename a query, you can do this from inside the Power Query window.

image Note   If you have returned data to a worksheet, and then you delete the worksheet, you will also delete the query—without any warning.

CSV Files

The scenario is as follows: you have been given a comma-separated text file (also known as a CSV file) containing a list of data. You now want to load this into Power Query so that you can look at the data and consider what needs doing (if anything) to make it useable. Here is what you have to do:

  1. In the Power Query ribbon, click From File and select From CSV in the popup menu. The Browse dialog will appear.
  2. Navigate to the folder containing the file and select the file (C:HighImpactDataVisualizationWithPowerBICountries.csv, in this example).
  3. Click OK. The Power Query window will open and display the contents of the table. This is shown in Figure 12-7.

    9781430266167_Fig12-07.jpg

    Figure 12-7. The Power Query window

  4. Uncheck the Load To Worksheet box at the bottom right of the Power Query Query Settings pane at the right of the Power Query window, because we do not want to load the data yet.
  5. Enter a name for this query in the Name box of the Power Query Query Settings pane. I suggest Countries as the query name.
  6. Select Apply & Close from the Power Query Home ribbon. You will return to Excel. You should now see the query appear in the WorkBook Queries window at the right of the Excel window.

And that, for the moment, is that. You have loaded the file into Power Query and, in the next chapter, you will soon learn how to shape the data and load it into either Excel or PowerPivot. If you really want to jump ahead to this discussion now, all is explained in the sections “Dataset Shaping” and “Data Cleansing and Modification” in Chapter 13.

What Is a CSV File?

Before we move on to other file types, there are a few comments I need to make about CSV files. There is a technical specification of what a “true” CSV file is, but I won’t bore you with that. What’s more many programs that generate CSV files do not always follow the definition exactly. What matters is that Power Query can handle files that

  • Have a .csv extension (it uses this by default to apply the right kind of processing).
  • Use a comma to separate the elements in a row. This too is a default that can be overridden, as you will see at the end of the next chapter.
  • End with a line feed, carriage return, or line feed/carriage return.
  • Can, optionally, contain double quotes to encapsulate fields. These will be stripped out as part of the data load process. If there are double quotes, they do not have to appear for every field, nor even for every record in a field that can have occasionally inconsistent double quotes.
  • Can contain “irregular” records, that is, rows that do not have every element that is found in a standard record. However the first row (whether or not it contains titles) must cover every element found in all the remaining records in the list. Put simply, any other record can be shorter than the first one but cannot be longer.
  • Do not contain anything other than the data table. If the file contains header rows or footer rows that are not part of the data, then Power Query cannot load the data table without further work. There are workarounds to this all-too-frequent problem, and one is given in the following chapter.

Text Files

If you have followed the process for loading a CSV file in the previous section, then you will find it considerably similar to loading a text file. This is not surprising. Both are text files, and both should contain a single list of data. The core differences are these:

  • A text file can have something other than a comma to separate the elements in a list. You can specify the delimiter when defining the load step.
  • A text file should normally have the extension .txt (though this, too, can be overridden).
  • A text file must be perfectly formed. That is, every record (row) must have the same number of elements as every other record.
  • A text file, too, must not contain anything other than the data table if you want a flawless data load first time.
  • If a text file encounters difficulties, it should import the data as a single column that you can then try and split up into multiple columns as described in chapter 13.

Here, then, is how to load a text file into Power Query:

  1. In the Power Query ribbon, click From File and select From Text in the popup menu. The Browse dialog will appear.
  2. Navigate to the folder containing the file and select the file (C:HighImpactDataVisualizationWithPowerBIDateTable.txt, in this example).
  3. Click OK. The Power Query window will open and display the contents of the table.
  4. Enter a name for this query in the Name box of the Power Query Query Settings pane. I suggest DateTable.
  5. Uncheck the Load To Worksheet box at the bottom right of the Power Query window, as we do not want actually to load the data yet; then select Apply & Close from the Power Query Home ribbon to return to Excel. You will see that this query has been added to the list of queries in this worksheet.

Where Power Query is really clever is that it can make a very educated guess as to how the text file is designed. That is, it can nearly always guess the field separator (the character that isolates each element in a list from the other elements) and so not only will it break the list into columns, but it will also avoid importing the column separator.

XML Files

XML, or Extensible Markup Language, is a standard means of sending data between IT systems. Consequently you have every chance of having to load an XML file one day. Although an XML file is just text, it is text that has been formatted in a very specific way, as you can see if you ever open a XML file in a text editor such as notepad. To load an XML file,

  1. In the Power Query ribbon click From File and select From XML in the popup menu. The Browse dialog will appear.
  2. Navigate to the folder containing the file and select the file (C:HighImpactDataVisualizationWithPowerBIColoursTable.xml, in this example).
  3. Click OK. The Power Query window will open and display the contents of the XML file.
  4. Display the Power Query Navigator window on the left of the Power Query window (unless it is already visible).
  5. Click on the Colours table (the second element) in the Navigator window. The contents of the table will be displayed. The Power Query window should look like Figure 12-8.

    9781430266167_Fig12-08.jpg

    Figure 12-8. The Power Query window after loading an XML file

  6. Uncheck the Load to Worksheet box at the bottom right of the Power Query window, since we do not want actually to load the data yet; then select Apply & Close from the Power Query Home ribbon to return to Excel.

The actual internal format of an XML file can get extremely complex. Sometimes an XML file will contain only one data table, sometimes it will contain many data separate tables. On other occasions it will contain one table whose records contain nested levels of data that you need to handle by expanding or aggregating. These techniques are described later in this chapter in the context of database sources.

Excel

If you are using Power Query you are probably already a major Excel user and have many, many spreadsheets full of data that you want to rationalize and use for analysis and presentation. So, let’s see how to load the contents of an Excel file. To begin with, I will show you how to get data from another workbook:

  1. In the Power Query ribbon click From File.
  2. In the Popup list, select From Excel. The Browse dialog will appear.
  3. Navigate to the directory containing the file that you want to look at (C:HighImpactDataVisualizationWithPowerBI, in this example).
  4. Select the source file (SalesData.xlsx, in this example) and click OK. The Navigator window will appear, showing the worksheets, tables, and ranges in the workbook file.
  5. Hover the mouse pointer over one of the tables in the Navigator window. The Peek popup will appear to show you what the data in the chosen table looks like.
  6. Click on the Sales2012_2013 table. The Power Query window will appear.
  7. Uncheck the Load To Data Model box at the bottom right of the Power Query window, since we do not want actually to load the data yet, then Select Apply & Close from the Power Query Home ribbon to return to Excel.

As you can see from this simple example, having Power Query read Excel data is really not difficult. However, you might still be wondering about a couple of things that you saw during this process, so here are some anticipatory comments:

The Navigator window will display

  • Worksheets
  • Named Ranges
  • Named Tables

Sometimes these can, in effect, be duplicate references to the same data, so you should really use the most precise data source that you can. For instance, I advise using a named table or a range name rather than a worksheet source, as the latter could easily end up containing “noise” data, which would make the load process more complex than it really needs to be.

Power Query will list, and use, data connections in a source Excel workbook if the data connection is active and has returned data to the workbook. Once a link to Power Query has been established, you can delete the data table itself in the source Excel workbook—and still use Power Query to load the data over the data connection in the source workbook.

Power Query will not take into account any data filters on an Excel data table. Consequently you will have to reapply any filters (of which you’ll learn more later) in Power Query if you want to subset the source data.

Excel Data from the Current Workbook

Loading a table from an existing worksheet into Power Query is, if anything, easier than the process that you just saw, but is, nonetheless, slightly different. Here is what you have to do:

  1. Select the table of data in Excel that you want to process in Power Query.
  2. Click the From Table button in the Power Query ribbon. The From Table dialog will appear, as shown in Figure 12-9.

    9781430266167_Fig12-09.jpg

    Figure 12-9. Loading data from the current spreadsheet—the From Table dialog

  3. If your table has headers, leave the check box selected (or uncheck it if there is not a header row).
  4. Click OK. The Power Query window will open.

The source list will be converted into an Excel table and its data loaded into Power Query.

Microsoft Access Databases

Another well-used data repository that proliferates in many corporations today is Microsoft Access. It is a powerful relational desktop database and can contain multiple tables, each containing millions of records. So we need to see how to load data from this particular source.

  1. In the Power Query ribbon, click From Database and select From Access Database in the popup menu. The Browse dialog will appear.
  2. Navigate to the folder containing the file and select the file (C:HighImpactDataVisualizationWithPowerBICarSales.accdb, in this example).
  3. Click OK. The navigator window will appear listing all the tables and queries in the Access database.
  4. Click on the Colours table and click Edit. The Power Query window will open and display the contents of the table. The table name will become the query name.
  5. Uncheck the Load To Worksheet box at the bottom right of the Power Query window, since we do not want actually to load the data yet, then Select Apply & Close from the Power Query Home ribbon to return to Excel. You will see that this query has been added to the list of queries in this worksheet.

I am sure that you can see a pattern emerging here. Indeed, this pattern will continue as you progress to loading tables from relational databases in a few pages time.

image Note   Power Query cannot see linked tables, only imported tables or tables that are actually in the Access database. It can, however, read queries overlaid upon native, linked, or imported data.

Loading Multiple Tables

An Access database can contain hundreds of tables or queries (which, like database views, can be considered as if they are tables). So what if you want to import several tables at once? No problem:

  1. Follow steps 1 through 3 of the procedure in the preceding section.
  2. Once the Navigator window is open and has finished loading the list of tables, check the Select Multiple Items box at the top of the window. A check box will appear to the left of every table and query in the Navigator.
  3. Select all the data tables and queries that you want to load into Power Query.
  4. Continue with the load as described previously.

Each table or query in Access will become a separate Power Query query and will be listed in the WorkBook Queries window. Each query will take the name of the table it has loaded into Power Query. If you want to see the contents of the query in the Power Query window and continue transforming the data, then all you have to do is to double-click on the query.

Loading the List of Tables

Another option that you may find useful is loading the list of tables and queries into Power Query rather than the data. There are two potential reasons for this:

  • The database contains hundreds of data tables and/or queries, and you want to keep the list of this metadata (as it is called).
  • Once you have saved the metadata as a query, you can then use it to preview the tables and apply data transformations (which follow in the next chapter) from inside the “metadata” query.

Here is how to do this:

  1. Follow steps 1 through 3 in the earlier “Microsoft Access Databases” section.
  2. Click on the database name in the Navigator window, not on a table or query name.
  3. Click Edit. The Power Query window will open and display the contents of the Access database.

How you can use this list of tables to load data is handled later in this chapter in the section “Database Metadata”. For the moment, let’s just say that it is a smart way to get a handle on your data.

Relational Databases

Enterprise relational databases still hold much of the world’s data, so you really need to know how to tap into the potentially vast mines of information that they contain. The bad news is that there are many, many databases out there, each with their intricacies and quirks. The good news is that once you have learned to load data from one of them, you should be able to use any of them.

As Power BI and Power Query are core Microsoft technologies, I will use the Microsoft enterprise relational database—SQL Server—as an example to show the principles of database access. The first advantage of this setup is that you probably do not need to install any software to enable access to SQL Server (though this is not always the case, so talk this through with your IT department). A second advantage is that the techniques are pretty similar to those used and applied by Oracle, DB2, and the other databases to which Power Query can connect. Furthermore, you can load multiple tables or views (each as a separate query) from a database. To see this in action (and presuming you have created the database CarSalesData as described in Appendix A), take the following steps:

  1. In the Power Query ribbon, click the From Database button. The list of potential database sources will appear.
  2. Click From SQL Server Database. The Microsoft SQL Database dialog will appear.
  3. Enter the server name in the Server text box. The dialog will look like Figure 12-10.

    9781430266167_Fig12-10.jpg

    Figure 12-10. The Microsoft SQL Database dialog

  4. Enter the database name; if you are using the sample data, it will be CarSalesData.
  5. Click OK. Power Query will connect to the server and will display the Navigator window containing all the tables and views in the database that you have permission to see on the server you selected.
  6. Check the Select Multiple Items check box.
  7. Click on the chek boxes for the SalesData and Countries tables. These two tables will appear in the Selected Items box in the Navigator. The Navigator window will look like Figure 12-11.

    9781430266167_Fig12-11.jpg

    Figure 12-11. The Navigator window when selecting multiple items

  8. Click on a table name in the Selected Items box. I will choose SalesData in this example.
  9. Click Edit in the Navigator window. The Power Query window will open and display the contents of the SalesData table. The table name will become the query name.
  10. Uncheck the Load to Worksheet box at the bottom right of the Power Query window, as we do not want actually to load the data yet; then select Apply & Close from the Power Query Home ribbon to return to Excel. You will see that this query has been added to the list of queries in this worksheet.

Since this is very similar to the way in which you loaded data from Access, I imagine that you are getting the hang of things by now. As always, though, there are a few comments to make.

First, let’s cover the initial connection to the server. The options are explained in Table 12-6.

Table 12-6. Database Connection Options

Option

Comments

Server

You cannot browse to find the server and need to type or paste the server name. If the server has an instance name, you need to enter the server and the instance.

Database

If you know the database, then you can enter (or paste) it here. This will restrict the number of available tables in the Navigator window and will make finding the correct table or view easier.

SQL Statement

If you enter an SQL SELECT statement that is fully qualified—that is, it contains the database reference if you have not added this already in the dialog—then Power Query will open the Power Query window and load the table directly.

image Note   These options will vary depending on the make of database you are connecting to.

Second, remember that the world of relational databases is huge; much, much more could be said at this point. I am afraid that I simply do not have the space to devote to all the subtleties of how you can use the available relational database sources, however. One important point to note is that you can use stored procedures (with SQL Server or Oracle, for instance) to return data. These are complex pieces of business logic that are stored in a database. To call a SQL Server stored procedure, you would enter the following elements into the Microsoft SQL Database dialog:

  • Server: <your server name>
  • Database: <the database name>
  • SQL Statement: EXEC <enter the schema (if there is one, followed by a period) and the stored procedure name>

This way, either you or your IT department can create complex and secure ways to allow data from the corporate databases to be read into Power Query from enterprise databases.

Finally, you can create multiple workbook queries at once if you have selected multiple items by clicking the Load button. This will create a query for each selected table which you can modify and extend later.

Database Metadata

There could well be occasions when you want to connect to a database, but not yet load the data. Perhaps you want merely to memorize the connection parameters, or alternatively, you want to examine the data more thoroughly. This is possible if you load not a table, but the database metadata, into Power Query, like this:

  1. Using the From Database button, load the Access database C:HighImpactDataVisualizationWithPowerBICarSales.accdb into the Navigator window.
  2. Click not on a table, but on the database name (CarSales.accdb) in the Navigator window.
  3. Click Edit. The database metadata will be loaded into the Power Query window.
  4. Expand the Navigator window of the Power Query window. You should see something like Figure 12-12.

    9781430266167_Fig12-12.jpg

    Figure 12-12. Database metadata

From here, you can look at the tables in the database and choose any single table that you wish to load by clicking on it in the Navigator window (or by right-clicking and selecting Drill-down in the table data). Creating a query like this, and then copying it to load the data tables, saves you from having to reconnect to the source repeatedly.

image Note   Here I was using MS Access as a demonstration. The same principles apply to all relational databases which Power Query can connect to.

Data Discovery or Data Load?

You have now seen how to load data from several different sources, although far from all of them. What you have been able to see, I hope, is that Power Query applies a common interface to the art and science of loading data, whatever the source.

Yet you may remember that, at the start of this chapter, I mentioned that Power Query is for discovering data as well as loading it. This covers the following points:

  • You can, for many data sources, peek at the data before you actually load it. This can save a considerable amount of wasted time by avoiding loading the wrong data.
  • When it comes to data outside the corporate environment, Power Query can help you search for data.

Peeking at Data

Only data sources containing multiple data tables can be previewed. If a source contains, by definition, only one data table, then you cannot “peek” at it but can only load it into Power Query and take a look at it there. As we have seen, Excel files, Access database files, and relational databases allow you to see the structure of the available data before you even see the data itself. However, I wanted to warn you first that the following file types cannot be previewed:

  • CSV Files—By definition, a CSV (comma separated values) file can only contain a single “table” of data. Indeed, if it contains anything else, you could have difficulties loading the file. So it simply loads directly.
  • Text Files—A text file is one that contains a single list of data where each element—or column, if you wish—is separated by a tab character, a pipe (|) character , or a semicolon, among others. This file, by definition, cannot contain anything else. So it simply loads directly, too.
  • Tables in the current Excel workbook—Because you can see any existing tables in the current workbook, you do not need to preview them.
  • XML files—Even though an XML file may contain multiple tables, it cannot be previewed. You will have to examine it in Power Query.

Searching for Data

There are already many thousands of publicly available data sources, and the number is growing at an incredible rate. It could well be that you need data from the public domain to add to your corporate data to provide comparisons, or perhaps you need it just on its own to make a point. Power Query can search for available public data and attempt to get you the information that you are looking for. As an example of this

  1. In the Power Query ribbon, click the Online Search button. The Online Search window will appear at the right of the spreadsheet window.
  2. Click inside the Search box at the top of the Online Search window and enter a search term. I will use Exchange Rates.
  3. Press Enter or click the Magnifying Glass icon to the right of the Search box. A list of available data sources will appear in the Search box, and the Search ribbon will appear. Hover the pointer over one of the results and the fly-out will appear to give you a preview of the data. This could look something like Figure 12-13.

    9781430266167_Fig12-13.jpg

    Figure 12-13. The Search window

The list of data sources could cover many pages of the Online Search window. You can flip from page to page by clicking the page number figures at the bottom of the window. Hovering the mouse pointer over a data source will display the fly-out showing the data. You can then load the data that interests you by right-clicking on it and selecting Edit.

Another facet of the Search facility is its ability to scan corporate data sources that have been made available using Power BI (this is explained in Chapter 15). If you or your colleagues have made searches available, and you have access rights, then Power Query will attempt to find these too.

The Search ribbon also lets you refine your search. Essentially you can

  • Choose one of the Scope buttons to expand or restrict the search to public and/or enterprise data.
  • Refine the search by searching specifically for words in the name or description of a search.

The Search ribbon is shown in Figure 12-14 and its options explained in Table 12-7.

9781430266167_Fig12-14.jpg

Figure 12-14. The Search ribbon

Table 12-7. Search Ribbon Options

Option

Description

All

Searches across all areas—public and organizational

My Shared

Only uses searches for queries that you have shared in Power BI.

Public

Searches for public data only

Organization

Searches for organizational data only

Name

Filters the search using the search name

Description

Filters the search using the search description

From

Filters the search to specific people or web sites

Data Source

Filters the search to specific sources of data

This Week

Filters the search to queries updated in a specific time frame

Column Names

Show results based on column name filtering

Certified

Only displays certified searches

Recent Searches

Lists any recent searches

Close Search

Closes the search window

image Note   Shared and organizational shared queries are explained in Chapter 15.

Conclusion

In this chapter you have seen how the latest addition to the Power BI toolset, Power Query, can help you find and load data from a variety of sources. These sources can be more traditional—such as Access, Excel, or text files—or they could come from big data repositories or social media sources. Indeed, data could even be found in public data repositories or from commercial cloud-based sources. Power Query will even help you find available data (from both inside and outside the enterprise) and remember any recent searches you have made. Yet this is only the first part of the story. Now you need to learn how to shape and tweak the data to prepare it for further use. This is the subject of the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.181.47