Bottleneck is a set of functions inspired by NumPy and SciPy, but written in Cython with high performance in mind. Bottleneck provides separate Cython functions for each combination of array dimensions, axis, and data type. This is not shown to the end user and the limiting factor for Bottleneck is to determine which Cython function to execute. Install Bottleneck as follows:
$ pip install bottleneck
We will compare the execution times for the numpy.median()
and scipy.stats.rankdata()
functions in relation to their Bottleneck counterparts. It can be useful to determine the Cython function manually before using it in a tight loop or frequently called function. Print the name of the Bottleneck median()
function as follows:
func, _ = bn.func.median_selector(a, axis=0) print "Bottleneck median func name", func
For the rankdata()
function, we can do the following:
func, _ = bn.func.rankdata_selector(a, axis=0) print "Bottleneck rankdata func name", func
This program is given in the bn_demo.py
file in this book's code bundle:
import bottleneck as bn import numpy as np import timeit setup = ''' import numpy as np import bottleneck as bn from scipy.stats import rankdata np.random.seed(42) a = np.random.randn(30) ''' def time(code, setup, n): return timeit.Timer(code, setup=setup).repeat(3, n) if __name__ == '__main__': n = 10**3 print n, "pass", max(time("pass", "", n)) print n, "min np.median", min(time('np.median(a)', setup, n)) print n, "min bn.median", min(time('bn.median(a)', setup, n)) a = np.arange(7) print "Median diff", np.median(a) - bn.median(a) func, _ = bn.func.median_selector(a, axis=0) print "Bottleneck median func name", func print n, "min scipy.stats.rankdata", min(time('rankdata(a)', setup, n)) print n, "min bn.rankdata", min(time('bn.rankdata(a)', setup, n)) func, _ = bn.func.rankdata_selector(a, axis=0) print "Bottleneck rankdata func name", func
The following is the output with running times and function names:
1000 pass 1.4066696167e-05 1000 min np.median 0.0271320343018 1000 min bn.median 0.00440287590027 Median diff 0.0 Bottleneck median func name <built-in function median_1d_int64_axis0> 1000 min scipy.stats.rankdata 0.0171868801117 1000 min bn.rankdata 0.00528407096863 Bottleneck rankdata func name <built-in function rankdata_1d_int64_axis0>
Clearly, Bottleneck is very fast; unfortunately, due to its setup, Bottleneck doesn't have that many functions yet. The following table lists the implemented functions from http://pypi.python.org/pypi/Bottleneck:
Category |
Functions |
---|---|
NumPy/SciPy |
|
Functions |
|
Moving window |
|
52.15.42.128