Analytics has become one of the buzzwords that define an age. Managers want their staff to deliver meaningful insight in seconds; users just want to do their jobs quickly and well. Everyone wants to produce clear, telling, and accurate analysis with tools that are intuitive and easy to use.
Microsoft recognized these trends and needs a few short years ago when they extended Excel with an add-in called Power Query. Once a mere optional extension to the world’s leading spreadsheet, Power Query is now a fundamental pillar of the Excel toolkit. It allows a user to take data from a wide range of sources and transform them into the base data that they can build on to add metrics, instant analyses, and KPIs to project their insights.
With Power Query, the era of self-service data access and transformation has finally arrived.
Connecting to source data outside the current Excel workbook (or file if you prefer) and accessing all or part of the data that you need to bring into Excel. This is the extract phase of ETL.
Reshaping the data (the “data mashup” process) so that the resulting data is in a form that can be used by Excel. Essentially, this means ensuring that the data is in a coherent, structured, and complete tabular format. This is the transform phase of ETL.
Returning the data into Excel as a table in a worksheet or into the Excel/Power Pivot data model. This is the load phase of ETL.
These three phases make up the data ingestion process. So it is worth taking a short look at what makes up each one of them.
Gone are the days when you manually entered all the data you needed into a spreadsheet. Today’s data are available in a multitude of locations and formats—and are too voluminous to rekey.
This is where Power Query’s ability to connect instantly to 40-odd standard data sources is simply invaluable. Is your accounting data in MS Dynamics? Just connect. Is your CRM data in Salesforce? Just connect. Is your organization using a Data Lake?...you can guess the reply.
Yet this is only a small part of what Power Query can do to help simplify your analyses. For not only can it connect to a multitude of data sources (many of which are outlined in Chapters 1 through 5), it does this via a unified interface that makes connecting to data sources brilliantly simple. On top of this, you can use Power Query to preview the source data and ensure that you are loading exactly what you need. Finally, to top it all, the same interface is used for just about all of the available source data connections. This means that once you have learned to set up one connection, you have learned how to connect to virtually all of the available data sources.
In essence, part of Power Query is just another connection to external data. However, its unified data access interface, range of available data sources, and sheer simplicity will probably induce you to replace any data connections made using older technologies pretty quickly.
Once you have established a connection to a data source, you may need to tweak the data in some way. Indeed, you may even need to reshape it entirely. This is the data mashup process—and it is the area where Power Query shines.
Filter source data so that you only load exactly the rows and columns you need
Extend the source data with calculations or data extracted from existing columns of data
Cleanse and rationalize the data easily and quickly in a multitude of ways
Join or split source tables to prepare a logical set of data tables for each specific analytical requirement
Group and aggregate source data to reduce the quantity of data loaded into Excel
Prepare source data tables to become a usable data model
This list merely scratches the surface of all that Power Query can do to mash up your data. It is, without hyperbole, unbelievably powerful at transforming source data. Indeed, it can carry out data ingestion and transformation tasks that used to be the preserve of expensive products that required complex programming skills and powerful servers.
All of this can now be done using a code-free interface that assists you in taking the messiest source data and delivering it to Excel as limpid tables of information ready to work with. If you wish to become a Power Query super-user, then you can extend its possibilities using the built-in M language.
A worksheet: Power Query can place the data from each source query into a separate worksheet . Once in a worksheet, it is perfectly “normal” Excel data. From here on you can do what you want to the data in Excel just as you normally would using all the Excel techniques that you have learned over the years.
The data model: Also referred to, often, as the Power Pivot data model (which is the term that I prefer to use), this is an in-memory data store. It can handle many more rows of data than Excel—tens of millions in some cases—and is normally the basis for pivot table output in Excel. When dealing with large source datasets, it is often the ideal destination for data that you have accessed using Power Query, as it is compressed in memory (and consequently takes up less space when saved to disk) and can easily exceed the 1,048,576 row limit of Excel worksheets.
The data model and Power Pivot are extensive subjects in their own right, and this book will not be looking at either of them in detail.
Data source connections, transformation routines, and data loading into Excel are created once and can be reused whenever suits you.
You can trigger manual data refreshes at any time—and these can be total refreshes of every source connection in a workbook or refreshes of a single source if you prefer.
Data sources can be reused across different Excel workbooks.
Power Query processes can be copied between different Excel files.
Power Query processes (called queries) can be managed and extended with interactive parameters to create immensely powerful ETL processes.
Power Query–based data flows can be customized and extended using the built-in M language.
So, as is the case for nearly all your Excel-based work, you are likely to build once and use often.
Power Query has evolved considerably over the years since it was first made available as a downloadable add-in for Excel 2010. It was still optional for Excel 2013 and only attained the status of being completely integrated into Excel by the 2016 version. Indeed, it suffered a name change at that point and was accessed under the heading “Get & Transform.”
Since around 2017—and since the Excel 2019 version—it has reverted to being Power Query once again. This is the version that is the subject of this book. This does not mean that you cannot use the techniques described in these pages with earlier versions of the product. However, it will mean that certain aspects of the Excel interface that you use to launch Power Query will be slightly different from those described in Chapters 1 through 5. These differences are essentially minor and should not present any difficulties to experienced Excel users.
This is made possible due to the fact that Power Query is accessed using a separate interface. It is called from inside Excel, but exists in its own parallel universe. This ensures a consistent look and feel whatever the version of Excel that you are using. The entry point into Power Query may change with Excel versions—but the product itself remains the same. Just remember that the range of available data sources will depend on the version of Excel that you are using. Some of the “enterprise-level” data sources are only available in Pro and Enterprise subscriptions to Excel.
Chapters 1 through 5 show you how to connect to a range of varied data sources and bring this data into Excel using Power Query. Depending on the source data that you need to use, you may only need to dip into parts of these chapters to find guidance on how to use a specific source data type.
Chapters 6 through 9 explain how to transform and clean data so that you can use it for analysis. These data transformations range from the extremely simple to the potentially complex. Indeed, they are as potentially vast as data itself. You may never need to apply all of the extensive range of data modification and cleansing techniques that Power Query can deliver—but just about everything that it can do is explained in detail in these chapters.
Chapters 10 and 11 explain how to tame the real world of data loading and transformation. Here you will learn how to organize and manage your queries, as well as how to add parameters to make them more interactive and resilient.
Chapter 12 introduces you to M—the language that Power Query uses to transform your data. Using M you can push your data ingestion and transformation routines to new heights that are simply not possible using just the Power Query interface.
This book comes with a small set of sample data that are used to create the examples that are used throughout the book. I realize that it may seem paradoxical to use a tiny dataset for a product that can handle tens of millions of rows of data, but I prefer to use a comprehensible set of source data so that the reader can concentrate on what is being learned, rather than the data itself.
It is inevitable that not every question can be anticipated and answered in one book. Nonetheless, I hope that I have answered many of the data ingestion and transformation questions that you might encounter and—more importantly—have given you the approaches and the confidence to resolve most of the Power Query challenges that you might meet when applying this product to solve real-world problems.
As a final point, the information on “pure” Power Query in Chapters 6–12 is independent of Excel. So if you are learning Power Query in Power BI, SQL Server Integration Services, or the Power BI Service, you can find a wealth of relevant information to assist you in your data transformation projects.
I wish you good luck in using Power Query, and I sincerely hope that you have as much fun using it as I did in writing this book.
Writing a technical book can be a lonely occupation. So I am all the more grateful for all the help and encouragement that I have received from so many fabulous friends and colleagues.
First, my considerable thanks go to Jonathan Gennick, the commissioning editor of this book. Throughout the publication process, Jonathan has been both a tower of strength and an exemplary mentor. He has always been available to share his vast experience selflessly and courteously.
Heartfelt thanks go to Jill Balzano, the Apress coordinating editor, for calmly managing this book through the production process. She succeeded—once again—in the well-nigh impossible task of making a potentially stress-filled trek into a pleasant journey filled with light and humor. Her team also deserves much praise for their efficiency under pressure.
I also owe a debt of gratitude to my wife, Karine, for her time and effort spent reviewing this book. Being a technical reviewer is a thankless task, but I want to say a heartfelt “thank you” to her for the range and depth of her comments and for picking up so much that otherwise would have gone unnoticed. The book is a better one thanks to her efforts.
My thanks also go to Ann Gemer Tuballa for her tireless and subtle work editing and polishing the prose and to the team at SPi Global for the hours spent preparing the book for publishing.
is an independent business intelligence consultant based in the United Kingdom. He has worked with SQL Server for over 25 years. During this time, he has developed several dozen reporting and analytical systems based on the Microsoft Analytics Stack.
Business intelligence has been Adam’s principal focus for the last 20 years. He has applied his skills for a variety of clients in a range of industry sectors. He is the author of Apress books: SQL Server 2012 Data Integration Recipes, Pro Power BI Desktop (now in its third edition), Business Intelligence with SQL Server Reporting Services, and High Impact Data Visualization in Excel with Power View, 3D Maps, Get & Transform and Power BI.
A graduate of Oxford University, Adam began his career in publishing before moving into IT. Databases soon became a passion, and his experience in this arena ranges from dBase to Oracle, and Access to MySQL, with occasional sorties into the world of DB2. He is, however, most at home in the Microsoft universe when using SQL Server Analysis Services, SQL Server Reporting Services, SQL Server Integration Services, and Power BI—both on-premises and in Azure.
A fluent French speaker, Adam has worked in France and Switzerland for many years.
is a principal consultant with Calidra Ltd., a UK-based data and analytics consultancy. A mathematics graduate of the Swiss Federal Institute of Technology, Karine has worked at a range of IT companies including IBM Global Services.
3.141.199.56