If you’re running Python in an IPython instance (e.g., Jupyter Notebook, Jupyter Lab, or IPython directly), you have access to “magic” commands that allow you to easily perform non-Python tasks.
Magic commands are called with %
or %%
. In a Jupyter Notebook the %timeit
will time a line of code and %%timeit
will time the entire cell of code.
Let’s time the different vectorization methods from Chapter 5.
import pandas as pd
import numpy as np
import numba
def avg_2(x, y):
return (x + y) / 2
@np.vectorize
def v_avg_2_mod(x, y):
"""Calculate the average, unless x is 20
Same as before, but we are using the vectorize decorator
"""
if (x == 20):
return(np.NaN)
else:
return (x + y) / 2
@numba.vectorize
def v_avg_2_numba(x, y):
"""Calculate the average, unless x is 20
Using the numba decorator.
"""
# we now have to add type information to our function
if (int(x) == 20):
return(np.NaN)
else:
return (x + y) / 2
df = pd.DataFrame({"a": [10, 20, 30], "b": [20, 30, 40]})
print(df)
a b
0 10 20
1 20 30
2 30 40
Timing the different methods.
%%timeit
avg_2(df['a'], df['b'])
67.1 µs ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%%timeit
v_avg_2_mod(df['a'], df['b'])
16.6 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%%timeit
v_avg_2_numba(df['a'].values, df['b'].values)
3.92 µs ± 632 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
The first method isn’t even as flexible as the custom functions we created. If you are working with mathematical calculations, you can get performance benefits from changing the library you are using. Otherwise, using vectorize()
can also help you write more readable apply
code.
18.119.136.84