Chapter 20

How can we measure which line in the code took the most time to complete?

The simplest way to do that is via a utility called line__profiler. This utility will show each line of the given code and show how much time was spent on each line. Knowing the distribution of the time that was required helps us focus on the right parts of the code.

Does NumPy run faster than Pandas?

In most cases with numeric computations, Pandas uses NumPy under the hood, so the difference is minimal. It does, however, spend certain additional time on building series and dataframes, when needed. So, for a well-scoped and purely numeric task, it makes sense to switch to pure NumPy.

When should we use Numba? What are the challenges and benefits of using Numba?

Numba uses a modern C compiler with some modern techniques to significantly improve performance. It can also be run on a GPU. Its "superpower" is that it's arbitrary Python code with only a few lines of alterations. This makes Numba a great tool of choice if you have a large set of pure Python code that needs to run faster. The challenges of Numba are twofold—first, it requires an LLVM compiler that is relatively large in size. Second, it is not trivial, and in some cases it's impossible to house with existing C code, which means it has problems with SciPy and sklearn.

When should we use Dask?

Dask is a powerful and nicely designed library for parallel computations—it can work on multiple cores of a single machine, or on many machines at the same time. Best of all, it has a few different interfaces that "resemble" (actually, just use under the hood) popular libraries, such as NumPy and pandas. As a result, on many occasions, you only need to change a few lines to run the same code in a distributed fashion.

Does code formatting matter? Why is Black better than linters?

It does. Good, standardized formatting helps improve the readability of code, decreases cognitive loads, and helps to avoid syntactic errors and typos. In addition, a unified approach to formatting decreases the number of pointless formatting changes that complicate the use of Git.

Black is an automated formatter—not a linter. Compared to linters, it not only finds code that needs to be edited but also edits it itself. Black is perfect to use on Git pre-commit hooks—it will automatically format the code on every commit.

How does Hypothesis help you test your code?

Standard unit tests provide one of a few cases for code to be tested against. While this is fine most of the time, there are usually plenty of options you wouldn't have thought of beforehand. Hypothesis tries to address that—it allows you to create a probabilistic dataset or set of arguments that follow certain rules—and then will test your code against different data. In doing so, it will use a few known edge cases, such as empty strings or data frames, and some random data. If a certain test fails, Hypothesis will start a new test from the data that led to a failure previously.

Table of Contents for Chapter 20

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 20