5. Apply Functions

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

5 Apply Functions

Learning about .apply() is fundamental in the data cleaning process. It also encapsulates key concepts in programming, mainly writing functions. The .apply() method takes a function and applies it (i.e., runs it) across each row or column of a DataFrame without having you write the code for each element separately.

If you’ve programmed before, then the concept of an apply should be familiar. It is similar to writing a for loop across each row or column and calling the function, or making a map() call to a function. In general, this is the preferred way to apply functions across dataframes, because it typically is much faster than writing a for loop in Python.

If you haven’t programmed before, then prepare to see how we can easily incorporate custom calculations that can be easily repeated across our data.

Learning Objectives

The concept map for this chapter can be found in Figure A.1.

Create and use functions
Use the .apply() method to iteratively perform a calculation across Series and DataFrames
Identify what parts of a Series and DataFrame are passed into .apply()
Create vectorized functions using Python decorators

Note About This Chapter

This chapter was also moved up from a later chapter for the second edition. This is one of the few parts of the book that relies on a completely toy example to simplify what is going on. Later on, we will be able to build on the skills taught in this chapter.

5.1 Primer on Functions

Functions are core elements of using the .apply() method. There’s a lot more information about functions in Appendix O, but here’s a quick introduction.

Functions are a way to group and reuse Python code. If you are ever in a situation where you are copying/pasting code and changing a few parts of the code, then chances are, the copied code can be written into a function. To create a function, we need to define it (with the def keyword). The body of a function is indented.

The PEP8 Style Guide for Python Code says to use four spaces for an indentation. This book uses two spaces for an indentation because of horizontal space limitations, but I am a new convert to using tabs for indentation because it creates more accessible code and is friendlier for people using Braille readers.¹

1. Tabs for accessibility: https://alexandersandberg.com/articles/default-to-tabs-instead-of-spaces-for-an-accessible-first-environment/

The basic function skeleton looks like this:

Table of Contents for 5. Apply Functions

Create new playlist

Sign In

Sign Up

5

Apply Functions

Learning Objectives

Note About This Chapter

5.1 Primer on Functions

5.2 Apply (Basics)

5.2.1 Apply Over a Series

5.2.2 Apply Over a DataFrame

5.2.2.1 Column-Wise Operations

5.2.2.2 Row-Wise Operations

5.3 Vectorized Functions

5.3.1 Vectorize with NumPy

5.3.2 Vectorize with Numba

5.4 Lambda Functions (Anonymous Functions)

Conclusion

Table of Contents for
5. Apply Functions