Preface

Microsoft is running a feedback forum for Excel on UserVoice where everybody can submit a new idea for others to vote on. The top voted feature request is “Python as an Excel scripting language”. It has more than double the votes of the second most voted feature request. What makes this combination so compelling and how can you start using Excel and Python together today? That is, in a nutshell, what this book is about.

The main driving force behind the “Python for Excel” story is the fact that we are living in a world of data. Nowadays, huge datasets are available to everybody and about everything. Often, these datasets are so big that they don’t fit into a spreadsheet anymore. A few years ago, this was referred to as big data, but nowadays, a dataset of a few million rows is really nothing special. Excel has evolved to cope with that trend: It introduced Power Query to load and clean datasets that don’t fit into a spreadsheet. It also introduced Power Pivot, an add-in to perform data analysis on these datasets and present the results. Power Query is based on the Power Query M formula language while Power Pivot defines formulas by using Data Analysis Expressions (DAX). If you also want to automate a few things in your Excel file then you would use Excel’s built-in automation language Visual Basic for Applications (VBA). That is, for something fairly simple, you can end up using VBA, M and DAX. One issue with this is that all these languages only serve you in Microsoft’s Excel and Power BI world (I will introduce Power BI briefly in Chapter 1).

Python, on the other hand, is a general-purpose programming language that has become one of the most popular choices amongst analysts and data scientists. If you use Python with Excel, you are able to use a programming language that is good at all aspects of the story, whether that’s automating Excel, accessing and preparing datasets or performing data analysis and visualization tasks. Most importantly, you can reuse your Python skills outside of Excel: if you need to scale up your computing power, you can simply move your quantitative model, simulation or machine learning application to the cloud, where practically unconstrained computing resources are waiting for you. Or you can turn your Excel tool into a web based dashboard with minimal efforts.

Why I Wrote This Book

Through my work on xlwings, the Excel automation package that we will meet in [Link to Come] of this book, I am in close contact with many users who use Python for Excel—whether that’s via the issue tracker on GitHub, a question on StackOverflow or at a physical event like a meetup or a conference.

On a regular basis, I am asked to recommend resources to get started with Python. While there is certainly no shortage of Python introductions, they tend to be either too general (nothing about data analysis) or too specific (full scientific introductions). However, Excel users tend to be somewhere in the middle: they certainly work with data, but a full scientific introduction may be too technical. They also tend to have specific requirements and questions that aren’t answered in any of the existing material. Some of these questions are:

  • How can I build a pivot table in Python?

  • What’s the equivalent to Excel’s AutoFilter?

  • Which Python-Excel package do I need for which task?

  • How do I move my Power Query database connection over to Python?

I wrote this book to get you from zero Python knowledge to be able to automate your Excel centric tasks and leverage Python’s data analysis and scientific computing tools in Excel without any detours.

Who This Book Is For

If you are an Excel user who wants to beat the limits of Excel with a modern programming language, this book is for you. Most typically, this means that you spend hours every month with downloading, cleaning and copy/pasting big amounts of data into mission-critical spreadsheets. While there are different ways to overcome Excel’s limits, this book will focus on how to use Python for this task.

This book doesn’t assume any coding experience and therefore contains a detailed introduction to all the tools we will use including an introduction to Python itself. If you are a seasoned VBA developer, you will find regular comparisons between Python and VBA that will allow you to ship around the common gotchas and hit the ground running.

This book can also be helpful if you are a Python developer and need to learn about the different ways that Python can deal with Excel and Excel files to be able to support your business users. In that case, you might want to start with Chapter 4 directly.

How This Book Is Organized

In this book, I will show you all aspects of the Python for Excel story which includes:

  • How can Python replace common Excel functionality using pandas, Python’s go-to package for data analysis

  • How can Python manipulate Excel files directly by using reader and writer packages

  • How can Python replace VBA macros and user defined functions1 (UDFs) by using automation packages

The first two topics will form [Link to Come] of this book while the last topic will form [Link to Come]. [Link to Come] will work on all platforms that Python supports, mainly Windows, macOS and Linux while [Link to Come] will only work on those platforms that Microsoft Excel supports, i.e. Windows and macOS.

Manipulating Workbooks vs. Programming Excel

In more detail, [Link to Come] will start with the fundamentals about Python and pandas, including how to setup your development environment. This will allow you to replace common Excel functionality with Python code that is usually much more efficient and allows you to work with datasets that don’t fit into an Excel spreadsheet. Up next are Python’s reader and writer packages for Excel files: these packages are able to read and write Excel workbooks directly from Python and as such replace the Excel application. This means that you don’t need an installation of Excel and in turn these packages work on any platform that Python supports. A typical use case for a reader package is to read in data from Excel files that you receive every morning from an external company or system and store their contents in a database. A typical use case for a writer package is to provide the functionality behind the famous “download to Excel” button that you find in almost every application. One characteristic of the reader packages is that Excel files need to be saved before a reader package can read the values.

[Link to Come] is about automation packages: You will learn how to run Python code that opens workbooks in Excel and manipulate them in front of your eyes. You won’t need to save the Excel files before you will be able to read out changed cell values. Using automation packages, you can use Python to build interactive Excel tools. These allow you to click a button to have Python perform a computationally expensive calculation, for example—something that you may have done previously with VBA macros. You’ll also learn how to write UDFs in Python instead of VBA. Here, you interact with what you see in the Excel application, not with files on disk. Figure P-1 compares the difference between manipulating workbooks and programming Excel visually.

excel python packages
Figure P-1. Manipulating workbooks (Part I) vs. programming Excel (Part II)

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

Using Code Examples

Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/fzumstein/python-for-excel. You can use this repository instead of typing all examples by yourself. To download, click on the green Code button, then select Download ZIP. Once downloaded, right-click the file on Windows and select Extract All... to unzip the contained files into a folder. On macOS, simply double-click the file to unzip. If you know how to work with Git, you could also use Git to clone the repository to your local hard disk. You can put the folder anywhere you want, but I will refer to it as follows in this book:

C:Usersusernamepython-for-excel

Accordingly, if you are on macOS or place the folder in a different directory, simply use your folder name instead. For [Link to Come] of this book, you can also run the Jupyter notebooks online without the need to download or install anything locally—I’ll give you a detailed description of how to do this in Chapter 2.

If you have a technical question or a problem using the code examples, please send email to .

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Python for Excel by Felix Zumstein (O’Reilly). Copyright 2020 Felix Zumstein, 978-1-492-08100-5.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

O’Reilly Online Learning

Note

For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge and expertise through books, articles, conferences, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, please visit http://oreilly.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://www.oreilly.com/catalog/9781492081005.

Email to comment or ask technical questions about this book.

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

1 Microsoft has started to use the term custom functions instead of UDFs. In this book, I will keep calling them UDFs though.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.210.17