Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Adam Aspin

Data Mashup with Microsoft Excel Using Power Query and M

Finding, Transforming, and Loading Data from External Sources

1st ed.

../images/497001_1_En_BookFrontmatter_Figa_HTML.png

Adam Aspin

Stafford, UK

ISBN 978-1-4842-6017-3e-ISBN 978-1-4842-6018-0

https://doi.org/10.1007/978-1-4842-6018-0

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

Introduction

Analytics has become one of the buzzwords that define an age. Managers want their staff to deliver meaningful insight in seconds; users just want to do their jobs quickly and well. Everyone wants to produce clear, telling, and accurate analysis with tools that are intuitive and easy to use.

Microsoft recognized these trends and needs a few short years ago when they extended Excel with an add-in called Power Query. Once a mere optional extension to the world’s leading spreadsheet, Power Query is now a fundamental pillar of the Excel toolkit. It allows a user to take data from a wide range of sources and transform them into the base data that they can build on to add metrics, instant analyses, and KPIs to project their insights.

With Power Query, the era of self-service data access and transformation has finally arrived.

What Is Power Query?

Power Query is a tool that is used to carry out ETL. This acronym stands for Extract, Transform, Load . This is the sequential process that covers

Connecting to source data outside the current Excel workbook (or file if you prefer) and accessing all or part of the data that you need to bring into Excel. This is the extract phase of ETL.
Reshaping the data (the “data mashup” process) so that the resulting data is in a form that can be used by Excel. Essentially, this means ensuring that the data is in a coherent, structured, and complete tabular format. This is the transform phase of ETL.
Returning the data into Excel as a table in a worksheet or into the Excel/Power Pivot data model. This is the load phase of ETL.

These three phases make up the data ingestion process. So it is worth taking a short look at what makes up each one of them.

Connecting to Source Data

Gone are the days when you manually entered all the data you needed into a spreadsheet. Today’s data are available in a multitude of locations and formats—and are too voluminous to rekey.

This is where Power Query’s ability to connect instantly to 40-odd standard data sources is simply invaluable. Is your accounting data in MS Dynamics? Just connect. Is your CRM data in Salesforce? Just connect. Is your organization using a Data Lake?...you can guess the reply.

Yet this is only a small part of what Power Query can do to help simplify your analyses. For not only can it connect to a multitude of data sources (many of which are outlined in Chapters 1 through 5), it does this via a unified interface that makes connecting to data sources brilliantly simple. On top of this, you can use Power Query to preview the source data and ensure that you are loading exactly what you need. Finally, to top it all, the same interface is used for just about all of the available source data connections. This means that once you have learned to set up one connection, you have learned how to connect to virtually all of the available data sources.

In essence, part of Power Query is just another connection to external data. However, its unified data access interface, range of available data sources, and sheer simplicity will probably induce you to replace any data connections made using older technologies pretty quickly.

Data Transformation

Once you have established a connection to a data source, you may need to tweak the data in some way. Indeed, you may even need to reshape it entirely. This is the data mashup process—and it is the area where Power Query shines.

Power Query can carry out the simplest data transformation tasks to the most complex data restructuring challenges in a few clicks. You can

Filter source data so that you only load exactly the rows and columns you need
Extend the source data with calculations or data extracted from existing columns of data
Cleanse and rationalize the data easily and quickly in a multitude of ways
Join or split source tables to prepare a logical set of data tables for each specific analytical requirement
Group and aggregate source data to reduce the quantity of data loaded into Excel
Prepare source data tables to become a usable data model

This list merely scratches the surface of all that Power Query can do to mash up your data. It is, without hyperbole, unbelievably powerful at transforming source data. Indeed, it can carry out data ingestion and transformation tasks that used to be the preserve of expensive products that required complex programming skills and powerful servers.

All of this can now be done using a code-free interface that assists you in taking the messiest source data and delivering it to Excel as limpid tables of information ready to work with. If you wish to become a Power Query super-user, then you can extend its possibilities using the built-in M language.

Loading into a Worksheet or the Data Model

This final phase is the easiest by far. It is simply a question of telling Power Query where to land the data. This can be one—or both—of

A worksheet: Power Query can place the data from each source query into a separate worksheet . Once in a worksheet, it is perfectly “normal” Excel data. From here on you can do what you want to the data in Excel just as you normally would using all the Excel techniques that you have learned over the years.
The data model: Also referred to, often, as the Power Pivot data model (which is the term that I prefer to use), this is an in-memory data store. It can handle many more rows of data than Excel—tens of millions in some cases—and is normally the basis for pivot table output in Excel. When dealing with large source datasets, it is often the ideal destination for data that you have accessed using Power Query, as it is compressed in memory (and consequently takes up less space when saved to disk) and can easily exceed the 1,048,576 row limit of Excel worksheets.

The data model and Power Pivot are extensive subjects in their own right, and this book will not be looking at either of them in detail.

Integrating Power Query into Daily Workflows

Power Query is completely integrated into the latest versions of Excel. This means that you can use it seamlessly as part of your daily routines when ingesting and analyzing data . Put simply:

Data source connections, transformation routines, and data loading into Excel are created once and can be reused whenever suits you.
You can trigger manual data refreshes at any time—and these can be total refreshes of every source connection in a workbook or refreshes of a single source if you prefer.
Data sources can be reused across different Excel workbooks.
Power Query processes can be copied between different Excel files.
Power Query processes (called queries) can be managed and extended with interactive parameters to create immensely powerful ETL processes.
Power Query–based data flows can be customized and extended using the built-in M language.

So, as is the case for nearly all your Excel-based work, you are likely to build once and use often.

The Evolution of Power Query

Power Query has evolved considerably over the years since it was first made available as a downloadable add-in for Excel 2010. It was still optional for Excel 2013 and only attained the status of being completely integrated into Excel by the 2016 version. Indeed, it suffered a name change at that point and was accessed under the heading “Get & Transform.”

Since around 2017—and since the Excel 2019 version—it has reverted to being Power Query once again. This is the version that is the subject of this book. This does not mean that you cannot use the techniques described in these pages with earlier versions of the product. However, it will mean that certain aspects of the Excel interface that you use to launch Power Query will be slightly different from those described in Chapters 1 through 5. These differences are essentially minor and should not present any difficulties to experienced Excel users.

This is made possible due to the fact that Power Query is accessed using a separate interface. It is called from inside Excel, but exists in its own parallel universe. This ensures a consistent look and feel whatever the version of Excel that you are using. The entry point into Power Query may change with Excel versions—but the product itself remains the same. Just remember that the range of available data sources will depend on the version of Excel that you are using. Some of the “enterprise-level” data sources are only available in Pro and Enterprise subscriptions to Excel.

How to Use This Book

If you wish, you can read this book from start to finish as it is designed to be a progressive tutorial that will help you to learn Power Query. However, as Power Query is composed of four main areas, this book is broken down into four sets of chapters that focus on the various key areas of the product. It follows that you can, if you prefer, focus on individual topics in Power Query without having to take a linear approach to reading this book.

Chapters 1 through 5 show you how to connect to a range of varied data sources and bring this data into Excel using Power Query. Depending on the source data that you need to use, you may only need to dip into parts of these chapters to find guidance on how to use a specific source data type.
Chapters 6 through 9 explain how to transform and clean data so that you can use it for analysis. These data transformations range from the extremely simple to the potentially complex. Indeed, they are as potentially vast as data itself. You may never need to apply all of the extensive range of data modification and cleansing techniques that Power Query can deliver—but just about everything that it can do is explained in detail in these chapters.
Chapters 10 and 11 explain how to tame the real world of data loading and transformation. Here you will learn how to organize and manage your queries, as well as how to add parameters to make them more interactive and resilient.
Chapter 12 introduces you to M—the language that Power Query uses to transform your data. Using M you can push your data ingestion and transformation routines to new heights that are simply not possible using just the Power Query interface.

On to Learning Power Query

This book comes with a small set of sample data that are used to create the examples that are used throughout the book. I realize that it may seem paradoxical to use a tiny dataset for a product that can handle tens of millions of rows of data, but I prefer to use a comprehensible set of source data so that the reader can concentrate on what is being learned, rather than the data itself.

It is inevitable that not every question can be anticipated and answered in one book. Nonetheless, I hope that I have answered many of the data ingestion and transformation questions that you might encounter and—more importantly—have given you the approaches and the confidence to resolve most of the Power Query challenges that you might meet when applying this product to solve real-world problems.

As a final point, the information on “pure” Power Query in Chapters 6–12 is independent of Excel. So if you are learning Power Query in Power BI, SQL Server Integration Services, or the Power BI Service, you can find a wealth of relevant information to assist you in your data transformation projects.

I wish you good luck in using Power Query, and I sincerely hope that you have as much fun using it as I did in writing this book.

Acknowledgments

Writing a technical book can be a lonely occupation. So I am all the more grateful for all the help and encouragement that I have received from so many fabulous friends and colleagues.

First, my considerable thanks go to Jonathan Gennick, the commissioning editor of this book. Throughout the publication process, Jonathan has been both a tower of strength and an exemplary mentor. He has always been available to share his vast experience selflessly and courteously.

Heartfelt thanks go to Jill Balzano, the Apress coordinating editor, for calmly managing this book through the production process. She succeeded—once again—in the well-nigh impossible task of making a potentially stress-filled trek into a pleasant journey filled with light and humor. Her team also deserves much praise for their efficiency under pressure.

I also owe a debt of gratitude to my wife, Karine, for her time and effort spent reviewing this book. Being a technical reviewer is a thankless task, but I want to say a heartfelt “thank you” to her for the range and depth of her comments and for picking up so much that otherwise would have gone unnoticed. The book is a better one thanks to her efforts.

My thanks also go to Ann Gemer Tuballa for her tireless and subtle work editing and polishing the prose and to the team at SPi Global for the hours spent preparing the book for publishing.

Table of Contents

Chapter 2: Discovering and Loading File-Based Data with Power Query 31

File Sources 31

Loading Data from Files 33

CSV Files 33

Text Files 37

Text and CSV Options 38

Fixed-Width Text Files 40

Simple XML Files 41

Excel Files 43

Why Use Power Query to Connect to Excel 47

From Table/Range 48

Microsoft Access Databases 49

JSON Files 51

Conclusion 51

Chapter 3: Loading Data from Databases and Data Warehouses 53

Relational Databases 54

SQL Server 56

Automatically Loading Related Tables 60

Database Options 61

Searching for Tables 64

Oracle Databases 71

Other Relational Databases 74

Microsoft SQL Server Analysis Services Data Sources 76

From Analysis Services 80

SSAS Tabular Data Warehouses 83

Types of Credentials When Connecting 87

Unable to Connect 87

Other Database Connections 88

Conclusion 88

Chapter 4: Loading Data from the Web and the Cloud 89

Web and Cloud Services 89

Web Pages 90

Online Services 90

Microsoft Azure 91

Web Pages 91

Advanced Web Options 93

Viewing the Source Web Page 94

Salesforce 95

Loading Data from Salesforce Objects 96

Salesforce Reports 102

Microsoft Dynamics 365 102

Azure SQL Database 104

Azure SQL Data Warehouse (Azure Synapse Analytics) 106

Connecting to SQL Server on an Azure Virtual Machine 108

Azure Blob Storage 110

Azure Security 112

Other Source Types 113

Conclusion 114

Chapter 5: Generic Data Sources 115

ODBC Sources 116

OLE DB Data Sources 125

OData Feeds 129

OData Options 132

Refreshing Data 132

Refreshing the Entire Data in the Excel In-Memory Model 133

Refreshing an Individual Query 133

Conclusion 134

Chapter 6: Structuring Imported Data 135

Extending Queries in Power Query 136

Editing Data After a Data Load 137

Transforming Data Before Loading 141

Query or Load? 141

The Power Query Editor 143

The Applied Steps List 144

The Power Query Editor Ribbons 145

Dataset Shaping 152

Renaming Columns 153

Reordering Columns 153

Removing Columns 155

Choosing Columns 156

Merging Columns 158

Moving to a Specific Column 161

Removing Records 162

Keeping Rows 163

Removing Duplicate Records 166

Sorting Data 167

Reversing the Row Order 168

Undoing a Sort Operation 168

Filtering Data 170

Selecting Specific Text Values 170

Finding Elements in the Filter List 171

Filtering Text Ranges 172

Filtering Numeric Ranges 173

Filtering Date and Time Ranges 174

Filtering Numeric Data 175

Applying Advanced Filters 177

Grouping Records 179

Simple Groups 179

Complex Groups 181

Count Rows 184

Saving Changes in the Query Editor 184

Exiting the Query Editor 184

Conclusion 185

Chapter 7: Data Transformation 187

Viewing a Full Record 188

Power Query Editor Context Menus 189

Using the First Row as Headers 190

Changing Data Type 192

Detecting Data Types 195

Data Type Indicators 195

Switching Data Types 196

Data Type Using Locale 197

Replacing Values 198

Transforming Column Contents 201

Text Transformation 201

Adding a Prefix or a Suffix 203

Removing Leading and Trailing Spaces 204

Removing Nonprinting Characters 204

Number Transformations 205

Filling Down Empty Cells 215

Extracting Part of a Column’s Contents 218

Advanced Extract Options 219

Duplicating Columns 221

Splitting Columns 222

Splitting Column by a Delimiter 223

Advanced Options for Delimiter Split 225

Splitting Columns by Number of Characters 226

Merging Columns 228

Creating Columns from Examples 229

Adding Conditional Columns 231

Index Columns 234

Conclusion 236

Chapter 9: Complex Data Loads 271

Adding Multiple Files from a Source Folder 272

Filtering Source Files in a Folder 275

Displaying and Filtering File Attributes 278

Removing Header Rows After Multiple File Loads 280

Combining Identically Structured Files 280

Loading and Parsing JSON Files 281

The List Tools Transform Ribbon 283

Parsing XML Data from a Column 284

Parsing JSON Data from a Column 286

Complex JSON Files 288

Complex XML Files 294

Convert a Column to a List 295

Reusing Data Sources 296

Pinning a Data Source 297

Copying Data from Power Query Editor 298

Conclusion 299

Chapter 11: Parameterizing Queries 321

Parameterizing Queries 321

Creating a Simple Parameter 322

Creating a Set of Parameter Values 324

Creating a Query-Based Parameter 326

Modifying a Parameter 328

Applying a Parameter When Filtering Records 329

Modifying the Current Value of a Parameter 331

Applying a Parameter to a Data Source 332

Other Uses for Parameters 334

Using Parameters in the Data Source Step 334

Applying a Parameter to a SQL Query 335

Query Icons 337

Conclusion 337

Chapter 12: The M Language 339

What Is the M Language? 340

M and the Power Query Editor 341

Modifying the Code for a Step 342

M Expressions 344

Writing M by Adding Custom Columns 345

The Advanced Editor 348

Expressions in the Advanced Editor 348

The Let Statement 350

Modifying M in the Advanced Editor 350

Syntax Checking 352

Basic M Functions 352

Text Functions 354

Number Functions 355

Date Functions 357

Time Functions 359

Duration Functions 359

M Concepts 360

M Data Types 361

M Values 362

Defining Your Own Variables in M 363

Writing M Queries 364

Lists 365

Creating Lists Manually 365

Generating Sequences Using Lists 367

Accessing Values from a List 367

List Functions 368

Records 368

Tables 370

Other Function Areas 372

Custom Functions in M 372

Adding Comments to M Code 374

Single-Line Comments 374

Multiline Comments 374

Conclusion 375

Appendix A: Sample Data 377

Downloading the Sample Data 377

Index 379

About the Author

Adam Aspin

is an independent business intelligence consultant based in the United Kingdom. He has worked with SQL Server for over 25 years. During this time, he has developed several dozen reporting and analytical systems based on the Microsoft Analytics Stack.

Business intelligence has been Adam’s principal focus for the last 20 years. He has applied his skills for a variety of clients in a range of industry sectors. He is the author of Apress books: SQL Server 2012 Data Integration Recipes, Pro Power BI Desktop (now in its third edition), Business Intelligence with SQL Server Reporting Services, and High Impact Data Visualization in Excel with Power View, 3D Maps, Get & Transform and Power BI.

A graduate of Oxford University, Adam began his career in publishing before moving into IT. Databases soon became a passion, and his experience in this arena ranges from dBase to Oracle, and Access to MySQL, with occasional sorties into the world of DB2. He is, however, most at home in the Microsoft universe when using SQL Server Analysis Services, SQL Server Reporting Services, SQL Server Integration Services, and Power BI—both on-premises and in Azure.

A fluent French speaker, Adam has worked in France and Switzerland for many years.

About the Technical Reviewer

Karine Aspin

is a principal consultant with Calidra Ltd., a UK-based data and analytics consultancy. A mathematics graduate of the Swiss Federal Institute of Technology, Karine has worked at a range of IT companies including IBM Global Services.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Front Matter

Create new playlist

Sign In

Sign Up

Data Mashup with Microsoft Excel Using Power Query and M

Finding, Transforming, and Loading Data from External Sources

What Is Power Query?

Connecting to Source Data

Data Transformation

Loading into a Worksheet or the Data Model

Integrating Power Query into Daily Workflows

The Evolution of Power Query

How to Use This Book

On to Learning Power Query

Table of Contents for
Front Matter