Comparing Bottleneck to NumPy functions

Bottleneck is a set of functions inspired by NumPy and SciPy, but written in Cython with high performance in mind. Bottleneck provides separate Cython functions for each combination of array dimensions, axis, and data type. This is not shown to the end user and the limiting factor for Bottleneck is to determine which Cython function to execute. Install Bottleneck as follows:

$ pip install bottleneck

We will compare the execution times for the numpy.median() and scipy.stats.rankdata() functions in relation to their Bottleneck counterparts. It can be useful to determine the Cython function manually before using it in a tight loop or frequently called function. Print the name of the Bottleneck median() function as follows:

func, _ = bn.func.median_selector(a, axis=0)
print "Bottleneck median func name", func

For the rankdata() function, we can do the following:

func, _ = bn.func.rankdata_selector(a, axis=0)
print "Bottleneck rankdata func name", func

This program is given in the bn_demo.py file in this book's code bundle:

import bottleneck as bn
import numpy as np
import timeit


setup = '''
import numpy as np
import bottleneck as bn
from scipy.stats import rankdata

np.random.seed(42)
a = np.random.randn(30)
'''
def time(code, setup, n):
    return timeit.Timer(code, setup=setup).repeat(3, n)

if __name__ == '__main__':
    n = 10**3
    print n, "pass", max(time("pass", "", n))
    print n, "min np.median", min(time('np.median(a)', setup, n))
    print n, "min bn.median", min(time('bn.median(a)', setup, n))
    a = np.arange(7)
    print "Median diff", np.median(a) - bn.median(a)
    func, _ = bn.func.median_selector(a, axis=0)
    print "Bottleneck median func name", func

    print n, "min scipy.stats.rankdata", min(time('rankdata(a)', setup, n))
    print n, "min bn.rankdata", min(time('bn.rankdata(a)', setup, n))
    func, _ = bn.func.rankdata_selector(a, axis=0)
    print "Bottleneck rankdata func name", func

The following is the output with running times and function names:

1000 pass 1.4066696167e-05
1000 min np.median 0.0271320343018
1000 min bn.median 0.00440287590027
Median diff 0.0
Bottleneck median func name <built-in function median_1d_int64_axis0>
1000 min scipy.stats.rankdata 0.0171868801117
1000 min bn.rankdata 0.00528407096863
Bottleneck rankdata func name <built-in function rankdata_1d_int64_axis0>

Clearly, Bottleneck is very fast; unfortunately, due to its setup, Bottleneck doesn't have that many functions yet. The following table lists the implemented functions from http://pypi.python.org/pypi/Bottleneck:

Category

Functions

NumPy/SciPy

median, nanmedian, rankdata, ss, nansum, nanmin, nanmax, nanmean, nanstd, nanargmin, and nanargmax

Functions

nanrankdata, nanvar, partsort, argpartsort, replace, nn, anynan, and allnan

Moving window

move_sum, move_nansum, move_mean, move_nanmean, move_median, move_std, move_nanstd, move_min, move_nanmin, move_max, and move_nanmax

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.42.128