7.6 List vs. array Performance: Introducing %timeit

Most array operations execute significantly faster than corresponding list operations. To demonstrate, we’ll use the IPython %timeit magic command, which times the average duration of operations. Note that the times displayed on your system may vary from what we show here.

Timing the Creation of a List Containing Results of 6,000,000 Die Rolls

We’ve demonstrated rolling a six-sided die 6,000,000 times. Here, let’s use the random module’s randrange function with a list comprehension to create a list of six million die rolls and time the operation using %timeit. Note that we used the line-continuation character () to split the statement in snippet [2] over two lines:

In [1]: import random

In [2]: %timeit rolls_list = 
   ...:    [random.randrange(1, 7) for i in range(0, 6_000_000)]
6.29 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

By default, %timeit executes a statement in a loop, and it runs the loop seven times. If you do not indicate the number of loops, %timeit chooses an appropriate value. In our testing, operations that on average took more than 500 milliseconds iterated only once, and operations that took fewer than 500 milliseconds iterated 10 times or more.

After executing the statement, %timeit displays the statement’s average execution time, as well as the standard deviation of all the executions. On average, %timeit indicates that it took 6.29 seconds (s) to create the list with a standard deviation of 119 milliseconds (ms). In total, the preceding snippet took about 44 seconds to run the snippet seven times.

Timing the Creation of an array Containing Results of 6,000,000 Die Rolls

Now, let’s use the randint function from the numpy.random module to create an array of 6,000,000 die rolls

In [3]: import numpy as np

In [4]: %timeit rolls_array = np.random.randint(1, 7, 6_000_000)
72.4 ms ± 635 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

On average, %timeit indicates that it took only 72.4 milliseconds with a standard deviation of 635 microseconds (µs) to create the array. In total, the preceding snippet took just under half a second to execute on our computer—about 1/100th of the time snippet [2] took to execute. The operation is two orders of magnitude faster with array!

60,000,000 and 600,000,000 Die Rolls

Now, let’s create an array of 60,000,000 die rolls:

In [5]: %timeit rolls_array = np.random.randint(1, 7, 60_000_000)
873 ms ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On average, it took only 873 milliseconds to create the array.

Finally, let’s do 600,000,000 million die rolls:

In [6]: %timeit rolls_array = np.random.randint(1, 7, 600_000_000)
10.1 s ± 232 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

It took about 10 seconds to create 600,000,000 elements with NumPy vs. about 6 seconds to create only 6,000,000 elements with a list comprehension.

Based on these timing studies, you can see clearly why arrays are preferred over lists for compute-intensive operations. In the data science case studies, we’ll enter the performance-intensive worlds of big data and AI. We’ll see how clever hardware, software, communications and algorithm designs combine to meet the often enormous computing challenges of today’s applications.

Customizing the %timeit Iterations

The number of iterations within each %timeit loop and the number of loops are customizable with the -n and -r options. The following executes snippet [4]’s statement three times per loop and runs the loop twice:2

In [7]: %timeit -n3 -r2 rolls_array = np.random.randint(1, 7, 6_000_000)
85.5 ms ± 5.32 ms per loop (mean ± std. dev. of 2 runs, 3 loops each)

Other IPython Magics

IPython provides dozens of magics for a variety of tasks—for a complete list, see the IPython magics documentation.3 Here are a few helpful ones:

  • %load to read code into IPython from a local file or URL.

  • %save to save snippets to a file.

  • %run to execute a .py file from IPython.

  • %precision to change the default floating-point precision for IPython outputs.

  • %cd to change directories without having to exit IPython first.

  • %edit to launch an external editor—handy if you need to modify more complex snippets.

  • %history to view a list of all snippets and commands you’ve executed in the current IPython session.

tick mark Self Check

  1. (IPython Session) Use %timeit to compare the execution time of the following two statements. The first uses a list comprehension to create a list of the integers from 0 to 9,999,999, then totals them with the built-in sum function. The second statement does the same thing using an array and its sum method.

    sum([x for x in range(10_000_000)])
    np.arange(10_000_000).sum()
    

    Answer:

    In [1]: import numpy as np
    
    In [2]: %timeit sum([x for x in range(10_000_000)])
    708 ms ± 28.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [3]: %timeit np.arange(10_000_000).sum()
    27.2 ms ± 676 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

The statement with the list comprehension took 26 times longer to execute than the one with the array.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.223.123